| Bug #39653 | find_shortest_key in sql_select.cc does not consider clustered primary keys | ||
|---|---|---|---|
| Submitted: | 25 Sep 2008 20:26 | Modified: | 3 Oct 2008 10:10 |
| Reporter: | Zardosht Kasheff (SCA) | ||
| Status: | Verified | ||
| Category: | Server: Optimizer | Severity: | S3 (Non-critical) |
| Version: | 5.1.25, 5.1.28 | OS: | Any |
| Assigned to: | Gleb Shchepa | Target Version: | |
| Tags: | find_shortest_key, query optimizer | ||
| Triage: | Needs Triage: D3 (Medium) / R6 (Needs Assessment) / E6 (Needs Assessment) | ||
[25 Sep 2008 20:34]
Sveta Smirnova
Thank you for the report. If I force index(b) I get similar results: explain select count(*) from bar force index(b); id 1 select_type SIMPLE table bar type index possible_keys NULL key b key_len 10 ref NULL rows 10 Extra Using index Could you please provide example which demonstrates performance increase if using another key?
[25 Sep 2008 20:40]
Sveta Smirnova
This also can be same issue as described in comment "[17 May 2005 14:27] Igor Babaev" for bug #10512
[25 Sep 2008 20:48]
Zardosht Kasheff
I cannot comment on if this is the same issue as what Igor states in #10512. As for queries that display the problem, take any query that has these properties: 1) Both the primary key and some other key are covering indexes 2) The storage engine has a clustered primary key, like InnoDB The bigger the disparity between the entire row size and the length of the non-primary covering key, the wider the gap in performance.
[26 Sep 2008 8:06]
Sveta Smirnova
Thank you for the feedback. In the example provided you have PRIMARY KEY which is unique index by definition and KEY(b,c) which is not unique. In the same time you ask about quantity of all rows. So it is natural what PRIMARY KEY is used. If change query to explain `select count(*) from bar where b=1;` index (b,c) is used: explain select count(*) from bar where b=1; id 1 select_type SIMPLE table bar type ref possible_keys b key b key_len 5 ref const rows 1 Extra Using where; Using index This is why I ask you to provide correct example.
[26 Sep 2008 14:51]
Zardosht Kasheff
Actually, it is not natural to use the primary key. The fact that the primary key is
unique and the key (b,c) is not unique does not matter. The only reason that the primary
key is being used is that it is of shorter length than the key (b,c).
Here is a way to verify what I say. Alter the table such that we have a key (c) and a
primary key of (a,b). If what you say is correct, then the primary should be used. It is
not. The key (c) is used because it has shorter length than the primary:
mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bar
type: index
possible_keys: NULL
key: PRIMARY
key_len: 4
ref: NULL
rows: 10
Extra: Using index
1 row in set (0.00 sec)
mysql> alter table bar drop primary key, drop index b;
Query OK, 10 rows affected (0.01 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> alter table bar add primary key(a,b), add index c(c);
Query OK, 10 rows affected (0.01 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> explain select count(*) from bar \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bar
type: index
possible_keys: NULL
key: c
key_len: 5
ref: NULL
rows: 10
Extra: Using index
1 row in set (0.00 sec)
[3 Oct 2008 10:10]
Valeriy Kravchuk
Verified just as described with 5.1.28. I think primary key of InnoDB (and any other engine with clustered primary key) should be considered differently for "index" access path. Entire row length should be taken into account.

Description: The function find_shortest_key at line 12598 in sql_select.cc, is used in make_join_readinfo to determine what covering index to use to satisfy a query. In doing so, it simply selects the index with the smallest key length. If the primary key is clustered, then this can lead to a very bad choice, because using the primary implies scanning through the entire table. What find_shortest_key should do is the following. If the primary key is a covering index and is clustered, like in MyISAM, then the behavior today should remain the same. If the primary key is clustered, like in InnoDB, then it should not consider using the primary key because then the storage engine will have to scan through much more data. How to repeat: Perform the following steps. This demonstrates the issue using InnoDB: mysql> create table bar (a int, b int, c int, d int, e int, f int) engine=InnoD B; Query OK, 0 rows affected (0.01 sec) mysql> alter table bar add primary key (a); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> alter table bar add key (b,c); Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> insert into bar values (1,1,1,1,1,1),(2,2,2,2,2,2),(3,3,3,3,3,3),(4,4,4, 4,4,4),(5,5,5,5,5,5),(6,6,6,6,6,6),(7,7,7,7,7,7),(8,8,8,8,8,8),(9,9,9,9,9,9),(1 1,11,11,11,11,11); Query OK, 10 rows affected (0.00 sec) Records: 10 Duplicates: 0 Warnings: 0 mysql> explain select count(*) from bar; +----+-------------+-------+-------+---------------+---------+---------+------+- -----+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+------+- -----+-------------+ | 1 | SIMPLE | bar | index | NULL | PRIMARY | 4 | NULL | 10 | Using index | +----+-------------+-------+-------+---------------+---------+---------+------+- -----+-------------+ 1 row in set (0.00 sec) mysql> explain select count(*) from bar \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: bar type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 10 Extra: Using index 1 row in set (0.00 sec) Suggested fix: if handler->primary_key_is_clustered() returns true, then find_shortest_key, or the places that use find_shortest_key, should compare the key lengths of other covering indexes with the entire row size, not the length of the primary key. In the above example, the key (b,c) should be used instead of the primary key