Bug #39653 find_shortest_key in sql_select.cc does not consider clustered primary keys
Submitted: 25 Sep 2008 20:26 Modified: 3 Oct 2008 10:10
Reporter: Zardosht Kasheff (SCA)
Status: Verified
Category:Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.25, 5.1.28 OS:Any
Assigned to: Gleb Shchepa Target Version:
Tags: find_shortest_key, query optimizer
Triage: Needs Triage: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment)

[25 Sep 2008 20:26] Zardosht Kasheff
Description:
The function find_shortest_key at line 12598 in sql_select.cc, is used in
make_join_readinfo to determine what covering index to use to satisfy a query. In doing
so, it simply selects the index with the smallest key length. If the primary key is
clustered, then this can lead to a very bad choice, because using the primary implies
scanning through the entire table.

What find_shortest_key should do is the following. If the primary key is a covering index
and is clustered, like in MyISAM, then the behavior today should remain the same. If the
primary key is clustered, like in InnoDB, then it should not consider using the primary
key because then the storage engine will have to scan through much more data.

How to repeat:
Perform the following steps. This demonstrates the issue using InnoDB:

mysql> create table bar (a int, b int, c int, d int, e int, f int) engine=InnoD
B;
Query OK, 0 rows affected (0.01 sec)

mysql> alter table bar add primary key (a);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table bar add key (b,c);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into bar values (1,1,1,1,1,1),(2,2,2,2,2,2),(3,3,3,3,3,3),(4,4,4,
4,4,4),(5,5,5,5,5,5),(6,6,6,6,6,6),(7,7,7,7,7,7),(8,8,8,8,8,8),(9,9,9,9,9,9),(1
1,11,11,11,11,11);
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar;
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |
rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
|  1 | SIMPLE      | bar   | index | NULL          | PRIMARY | 4       | NULL |
  10 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+-
-----+-------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

Suggested fix:
if handler->primary_key_is_clustered() returns true, then find_shortest_key, or the
places that use find_shortest_key, should compare the key lengths of other covering
indexes with the entire row size, not the length of the primary key.

In the above example, the key (b,c) should be used instead of the primary key
[25 Sep 2008 20:34] Sveta Smirnova
Thank you for the report.

If I force index(b) I get similar results:

explain select count(*) from bar force index(b);
id      1
select_type     SIMPLE
table   bar
type    index
possible_keys   NULL
key     b
key_len 10
ref     NULL
rows    10
Extra   Using index

Could you please provide example which demonstrates performance increase if using another
key?
[25 Sep 2008 20:40] Sveta Smirnova
This also can be same issue as described in comment "[17 May 2005 14:27] Igor Babaev" for
bug #10512
[25 Sep 2008 20:48] Zardosht Kasheff
I cannot comment on if this is the same issue as what Igor states in #10512.

As for queries that display the problem, take any query that has these properties:
1) Both the primary key and some other key are covering indexes
2) The storage engine has a clustered primary key, like InnoDB

The bigger the disparity between the entire row size and the length of the non-primary
covering key, the wider the gap in performance.
[26 Sep 2008 8:06] Sveta Smirnova
Thank you for the feedback.

In the example provided you have PRIMARY KEY which is unique index by definition and
KEY(b,c) which is not unique. In the same time you ask about quantity of all rows. So it
is natural what PRIMARY KEY  is used. If change query to explain `select count(*) from
bar where b=1;` index (b,c) is used:

explain select count(*) from bar where b=1;
id      1
select_type     SIMPLE
table   bar
type    ref
possible_keys   b
key     b
key_len 5
ref     const
rows    1
Extra   Using where; Using index

This is why I ask you to provide correct example.
[26 Sep 2008 14:51] Zardosht Kasheff
Actually, it is not natural to use the primary key. The fact that the primary key is
unique and the key (b,c) is not unique does not matter. The only reason that the primary
key is being used is that it is of shorter length than the key (b,c).

Here is a way to verify what I say. Alter the table such that we have a key (c) and a
primary key of (a,b). If what you say is correct, then the primary should be used. It is
not. The key (c) is used because it has shorter length than the primary:

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)

mysql> alter table bar drop primary key, drop index b;
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> alter table bar add primary key(a,b), add index c(c);
Query OK, 10 rows affected (0.01 sec)
Records: 10  Duplicates: 0  Warnings: 0

mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: bar
         type: index
possible_keys: NULL
          key: c
      key_len: 5
          ref: NULL
         rows: 10
        Extra: Using index
1 row in set (0.00 sec)
[3 Oct 2008 10:10] Valeriy Kravchuk
Verified just as described with 5.1.28. 

I think primary key of InnoDB (and any other engine with clustered primary key) should be
considered differently for "index" access path. Entire row length should be taken into
account.