MySQL Bugs: #39653: find_shortest_key in sql_select.cc does not consider clustered primary keys

Bug #39653	find_shortest_key in sql_select.cc does not consider clustered primary keys
Submitted:	25 Sep 2008 18:26	Modified:	20 Jun 2010 22:48
Reporter:	Zardosht Kasheff (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.25, 5.1.28	OS:	Any
Assigned to:	Gleb Shchepa	CPU Architecture:	Any
Tags:	find_shortest_key, query optimizer

Description:
The function find_shortest_key at line 12598 in sql_select.cc, is used in make_join_readinfo to determine what covering index to use to satisfy a query. In doing so, it simply selects the index with the smallest key length. If the primary key is clustered, then this can lead to a very bad choice, because using the primary implies scanning through the entire table.

What find_shortest_key should do is the following. If the primary key is a covering index and is clustered, like in MyISAM, then the behavior today should remain the same. If the primary key is clustered, like in InnoDB, then it should not consider using the primary key because then the storage engine will have to scan through much more data.

How to repeat:
Perform the following steps. This demonstrates the issue using InnoDB:

mysql> create table bar (a int, b int, c int, d int, e int, f int) engine=InnoD
B;
Query OK, 0 rows affected (0.01 sec)

mysql> alter table bar add primary key (a);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table bar add key (b,c);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into bar values (1,1,1,1,1,1),(2,2,2,2,2,2),(3,3,3,3,3,3),(4,4,4,
4,4,4),(5,5,5,5,5,5),(6,6,6,6,6,6),(7,7,7,7,7,7),(8,8,8,8,8,8),(9,9,9,9,9,9),(1
1,11,11,11,11,11);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
|  1 | SIMPLE      | bar   | index | NULL          | PRIMARY | 4       | NULL |
  10 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

Suggested fix:
if handler->primary_key_is_clustered() returns true, then find_shortest_key, or the places that use find_shortest_key, should compare the key lengths of other covering indexes with the entire row size, not the length of the primary key.

In the above example, the key (b,c) should be used instead of the primary key

Thank you for the report.

If I force index(b) I get similar results:

explain select count(*) from bar force index(b);
id      1
select_type     SIMPLE
table   bar
type    index
possible_keys   NULL
key     b
key_len 10
ref     NULL
rows    10
Extra   Using index

Could you please provide example which demonstrates performance increase if using another key?

This also can be same issue as described in comment "[17 May 2005 14:27] Igor Babaev" for bug #10512

I cannot comment on if this is the same issue as what Igor states in #10512.

As for queries that display the problem, take any query that has these properties:
1) Both the primary key and some other key are covering indexes
2) The storage engine has a clustered primary key, like InnoDB

The bigger the disparity between the entire row size and the length of the non-primary covering key, the wider the gap in performance.

Thank you for the feedback.

In the example provided you have PRIMARY KEY which is unique index by definition and KEY(b,c) which is not unique. In the same time you ask about quantity of all rows. So it is natural what PRIMARY KEY  is used. If change query to explain `select count(*) from bar where b=1;` index (b,c) is used:

explain select count(*) from bar where b=1;
id      1
select_type     SIMPLE
table   bar
type    ref
possible_keys   b
key     b
key_len 5
ref     const
rows    1
Extra   Using where; Using index

This is why I ask you to provide correct example.

Actually, it is not natural to use the primary key. The fact that the primary key is unique and the key (b,c) is not unique does not matter. The only reason that the primary key is being used is that it is of shorter length than the key (b,c).

Here is a way to verify what I say. Alter the table such that we have a key (c) and a primary key of (a,b). If what you say is correct, then the primary should be used. It is not. The key (c) is used because it has shorter length than the primary:

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

mysql> alter table bar drop primary key, drop index b;
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> alter table bar add primary key(a,b), add index c(c);
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: c
      key_len: 5
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

Verified just as described with 5.1.28. 

I think primary key of InnoDB (and any other engine with clustered primary key) should be considered differently for "index" access path. Entire row length should be taken into account.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101713

3358 Gleb Shchepa	2010-02-26
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102391

3358 Gleb Shchepa	2010-03-05
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/102478

3372 Gleb Shchepa	2010-03-05
      Bug #39653: find_shortest_key in sql_select.cc does not
                  consider clustered primary keys
      
      Choosing a shortest index for the covering index scan,
      the optimizer ignored the fact, that the clustered primary
      key read involves whole table data.
      
      The find_shortest_key function has been modified to
      take into account that fact that a clustered PK has a
      longest key of possible covering indices.
     @ mysql-test/r/innodb_mysql.result
        Test case for bug #39653.
     @ mysql-test/t/innodb_mysql.test
        Test case for bug #39653.
     @ sql/sql_select.cc
        Bug #39653: find_shortest_key in sql_select.cc does not
                    consider clustered primary keys
        
        The find_shortest_key function has been modified to
        take into account that fact that a clustered PK has a
        longest key of possible covering indices.

Pushed into 5.5.4-m3 (revid:alik@sun.com-20100326080914-2pz8ns984e0spu03) (version source revid:alexey.kopytov@sun.com-20100307164059-cri8typa32cypq0l) (merge vers: 5.5.3-m2) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100326081116-m3v4l34yhr43mtsv) (version source revid:alik@sun.com-20100325072612-4sds00ix8ajo1e84) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100326081944-qja07qklw1p2w7jb) (version source revid:alik@sun.com-20100325073410-4t4i9gu2u1pge7xb) (merge vers: 6.0.14-alpha) (pib:16)

Will this patch make it into 5.1?

Pushed into 5.1.46 (revid:sergey.glukhov@sun.com-20100405111026-7kz1p8qlzglqgfmu) (version source revid:gshchepa@mysql.com-20100305194555-ts37kuiie7vbh9k6) (merge vers: 5.1.45) (pib:16)

Noted in 5.1.46, 5.5.5, 6.0.14 changelogs.

While looking for the shortest index for a covering index scan, the 
optimizer ignored that a clustered primary key read the entire table.

Paul, please change the description to:

"While looking for the shortest index for a covering index scan, the optimizer didn't consider the full row length for a clustered primary key, as in InnoDB. Secondary covering indexes will now be preferred, making full table scans less likely."

Added performance tag, revised changelog wording per previous comment.

This probably closes bug 19271 also (innodb optimizer prefers primary over secondary indexes for count(*))

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609211156-tsac5qhw951miwtt) (merge vers: 5.1.46-ndb-6.2.19) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

actually, this change can have quite negative impacts.

e.g. on a table like this:
CREATE TABLE `pagelinks` (
  `pl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `pl_namespace` int(11) NOT NULL DEFAULT '0',
  `pl_title` varbinary(255) NOT NULL DEFAULT '',
  UNIQUE KEY `pl_from` (`pl_from`,`pl_namespace`,`pl_title`),
  KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`)
)

both indexes satisfy 'SELECT * FROM table' query done by mysqldump, but PK is much much hotter, and secondary key is relatively cold, so suddenly after #39653 equal-width secondary key is preferred - this has made our dumps take six days instead of half an hour.

Domas, I've created bug #55656 for your report, please comment on the suggested fix over there.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/115915

3181 Evgeny Potemkin	2010-08-17
      Bug #55656: mysqldump can be slower after bug #39653 fix.
      After fix for bug#39653 the shortest available secondary index was used for
      full table scan. Primary clustered key was used only if no secondary index
      can be used. However, when chosen secondary index includes all fields of the
      table being scanned it's better to use primary index since the amount of
      data to scan is the same but the primary index is clustered.
      Now the find_shortest_key function takes this into account.
     @ mysql-test/suite/innodb/r/innodb_mysql.result
        Aadded a test case for the bug#55656.
     @ mysql-test/suite/innodb/t/innodb_mysql.test
        Aadded a test case for the bug#55656.
     @ sql/sql_select.cc
        Bug #55656: mysqldump can be slower after bug #39653 fix.
        The find_shortest_key function now prefers clustered primary key
        if found secondary key includes all fields of the table.

Fix for this bug also fixes bug#55204.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/116781

3181 Evgeny Potemkin	2010-08-25
      Bug #55656: mysqldump can be slower after bug #39653 fix.
      After fix for bug#39653 the shortest available secondary index was used for
      full table scan. Primary clustered key was used only if no secondary index
      can be used. However, when chosen secondary index includes all fields of the
      table being scanned it's better to use primary index since the amount of
      data to scan is the same but the primary index is clustered.
      Now the find_shortest_key function takes this into account.
     @ mysql-test/suite/innodb/r/innodb_mysql.result
        Aadded a test case for the bug#55656.
     @ mysql-test/suite/innodb/t/innodb_mysql.test
        Aadded a test case for the bug#55656.
     @ sql/sql_select.cc
        Bug #55656: mysqldump can be slower after bug #39653 fix.
        The find_shortest_key function now prefers clustered primary key
        if found secondary key includes all fields of the table.

Bug #19271 marked as duplicate of this