Bug #39653 find_shortest_key in sql_select.cc does not consider clustered primary keys
Submitted: 25 Sep 2008 18:26 Modified: 20 Jun 2010 22:48
Reporter: Zardosht Kasheff (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.25, 5.1.28 OS:Any
Assigned to: Gleb Shchepa
Tags: find_shortest_key, query optimizer
Triage: Triaged: D3 (Medium) / R2 (Low) / E2 (Low)

[25 Sep 2008 18:26] Zardosht Kasheff
Description:
The function find_shortest_key at line 12598 in sql_select.cc, is used in make_join_readinfo to determine what covering index to use to satisfy a query. In doing so, it simply selects the index with the smallest key length. If the primary key is clustered, then this can lead to a very bad choice, because using the primary implies scanning through the entire table.

What find_shortest_key should do is the following. If the primary key is a covering index and is clustered, like in MyISAM, then the behavior today should remain the same. If the primary key is clustered, like in InnoDB, then it should not consider using the primary key because then the storage engine will have to scan through much more data.

How to repeat:
Perform the following steps. This demonstrates the issue using InnoDB:

mysql> create table bar (a int, b int, c int, d int, e int, f int) engine=InnoD
B;
Query OK, 0 rows affected (0.01 sec)

mysql> alter table bar add primary key (a);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table bar add key (b,c);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into bar values (1,1,1,1,1,1),(2,2,2,2,2,2),(3,3,3,3,3,3),(4,4,4,
4,4,4),(5,5,5,5,5,5),(6,6,6,6,6,6),(7,7,7,7,7,7),(8,8,8,8,8,8),(9,9,9,9,9,9),(1
1,11,11,11,11,11);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
|  1 | SIMPLE      | bar   | index | NULL          | PRIMARY | 4       | NULL |
  10 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

Suggested fix:
if handler->primary_key_is_clustered() returns true, then find_shortest_key, or the places that use find_shortest_key, should compare the key lengths of other covering indexes with the entire row size, not the length of the primary key.

In the above example, the key (b,c) should be used instead of the primary key
[25 Sep 2008 18:34] Sveta Smirnova
Thank you for the report.

If I force index(b) I get similar results:

explain select count(*) from bar force index(b);
id      1
select_type     SIMPLE
table   bar
type    index
possible_keys   NULL
key     b
key_len 10
ref     NULL
rows    10
Extra   Using index

Could you please provide example which demonstrates performance increase if using another key?
[25 Sep 2008 18:40] Sveta Smirnova
This also can be same issue as described in comment "[17 May 2005 14:27] Igor Babaev" for bug #10512
[25 Sep 2008 18:48] Zardosht Kasheff
I cannot comment on if this is the same issue as what Igor states in #10512.

As for queries that display the problem, take any query that has these properties:
1) Both the primary key and some other key are covering indexes
2) The storage engine has a clustered primary key, like InnoDB

The bigger the disparity between the entire row size and the length of the non-primary covering key, the wider the gap in performance.
[26 Sep 2008 6:06] Sveta Smirnova
Thank you for the feedback.

In the example provided you have PRIMARY KEY which is unique index by definition and KEY(b,c) which is not unique. In the same time you ask about quantity of all rows. So it is natural what PRIMARY KEY  is used. If change query to explain `select count(*) from bar where b=1;` index (b,c) is used:

explain select count(*) from bar where b=1;
id      1
select_type     SIMPLE
table   bar
type    ref
possible_keys   b
key     b
key_len 5
ref     const
rows    1
Extra   Using where; Using index

This is why I ask you to provide correct example.
[26 Sep 2008 12:51] Zardosht Kasheff
Actually, it is not natural to use the primary key. The fact that the primary key is unique and the key (b,c) is not unique does not matter. The only reason that the primary key is being used is that it is of shorter length than the key (b,c).

Here is a way to verify what I say. Alter the table such that we have a key (c) and a primary key of (a,b). If what you say is correct, then the primary should be used. It is not. The key (c) is used because it has shorter length than the primary:

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

mysql> alter table bar drop primary key, drop index b;
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> alter table bar add primary key(a,b), add index c(c);
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: c
      key_len: 5
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)
[3 Oct 2008 8:10] Valerii Kravchuk
Verified just as described with 5.1.28. 

I think primary key of InnoDB (and any other engine with clustered primary key) should be considered differently for "index" access path. Entire row length should be taken into account.
[26 Feb 2010 19:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101713

3358 Gleb Shchepa	2010-02-26
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.
[5 Mar 2010 7:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102391

3358 Gleb Shchepa	2010-03-05
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.
[5 Mar 2010 20:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102478

3372 Gleb Shchepa	2010-03-05
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.
[26 Mar 2010 8:23] Bugs System
Pushed into 5.5.4-m3 (revid:alik@sun.com-20100326080914-2pz8ns984e0spu03) (version source revid:alexey.kopytov@sun.com-20100307164059-cri8typa32cypq0l) (merge vers: 5.5.3-m2) (pib:16)
[26 Mar 2010 8:27] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100326081116-m3v4l34yhr43mtsv) (version source revid:alik@sun.com-20100325072612-4sds00ix8ajo1e84) (pib:16)
[26 Mar 2010 8:31] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100326081944-qja07qklw1p2w7jb) (version source revid:alik@sun.com-20100325073410-4t4i9gu2u1pge7xb) (merge vers: 6.0.14-alpha) (pib:16)
[26 Mar 2010 19:44] Zardosht Kasheff
Will this patch make it into 5.1?
[6 Apr 2010 7:56] Bugs System
Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:gshchepa@mysql.com-20100305194555-ts37kuiie7vbh9k6) (merge vers: 5.1.45) (pib:16)
[15 Apr 2010 15:44] Paul Dubois
Noted in 5.1.46, 5.5.5, 6.0.14 changelogs.

While looking for the shortest index for a covering index scan, the 
optimizer ignored that a clustered primary key read the entire table.
[16 May 2010 3:10] James Day
Paul, please change the description to:

"While looking for the shortest index for a covering index scan, the optimizer didn't consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely."
[16 May 2010 20:37] Paul Dubois
Added performance tag, revised changelog wording per previous comment.
[25 May 2010 16:50] Dan Nelson
This probably closes bug 19271 also (innodb optimizer prefers primary over secondary indexes for count(*))
[17 Jun 2010 11:45] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 12:23] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609211156-tsac5qhw951miwtt) (merge vers: 5.1.46-ndb-6.2.19) (pib:16)
[17 Jun 2010 13:10] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 Jul 2010 23:27] Domas Mituzas
actually, this change can have quite negative impacts.

e.g. on a table like this:
CREATE TABLE `pagelinks` (
  `pl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `pl_namespace` int(11) NOT NULL DEFAULT '0',
  `pl_title` varbinary(255) NOT NULL DEFAULT '',
  UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
  KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`)
)

both indexes satisfy 'SELECT * FROM table' query done by mysqldump, but PK is much much hotter, and secondary key is relatively cold, so suddenly after #39653 equal-width secondary key is preferred - this has made our dumps take six days instead of half an hour.
[30 Jul 2010 20:46] James Day
Domas, I've created bug #55656 for your report, please comment on the suggested fix over there.
[17 Aug 2010 10:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115915

3181 Evgeny Potemkin	2010-08-17
      Bug #55656: mysqldump can be slower after bug #39653 fix.
      After fix for bug#39653 the shortest available secondary index was used for
      full table scan. Primary clustered key was used only if no secondary index
      can be used. However, when chosen secondary index includes all fields of the
      table being scanned it's better to use primary index since the amount of
      data to scan is the same but the primary index is clustered.
      Now the find_shortest_key function takes this into account.
     @ mysql-test/suite/innodb/r/innodb_mysql.result
        Aadded a test case for the bug#55656.
     @ mysql-test/suite/innodb/t/innodb_mysql.test
        Aadded a test case for the bug#55656.
     @ sql/sql_select.cc
        Bug #55656: mysqldump can be slower after bug #39653 fix.
        The find_shortest_key function now prefers clustered primary key
        if found secondary key includes all fields of the table.
[17 Aug 2010 10:57] Evgeny Potemkin
Fix for this bug also fixes bug#55204.
[25 Aug 2010 15:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/116781

3181 Evgeny Potemkin	2010-08-25
      Bug #55656: mysqldump can be slower after bug #39653 fix.
      After fix for bug#39653 the shortest available secondary index was used for
      full table scan. Primary clustered key was used only if no secondary index
      can be used. However, when chosen secondary index includes all fields of the
      table being scanned it's better to use primary index since the amount of
      data to scan is the same but the primary index is clustered.
      Now the find_shortest_key function takes this into account.
     @ mysql-test/suite/innodb/r/innodb_mysql.result
        Aadded a test case for the bug#55656.
     @ mysql-test/suite/innodb/t/innodb_mysql.test
        Aadded a test case for the bug#55656.
     @ sql/sql_select.cc
        Bug #55656: mysqldump can be slower after bug #39653 fix.
        The find_shortest_key function now prefers clustered primary key
        if found secondary key includes all fields of the table.
[18 Aug 2016 8:23] Umesh Shastry
Bug #19271 marked as duplicate of this