Bug #36817 | Non optimal index choice, depending on index creation order | ||
---|---|---|---|
Submitted: | 20 May 2008 13:19 | Modified: | 5 Jul 2009 12:24 |
Reporter: | jocelyn fournier (Silver Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 5.0.54a,5.1.19, 5.1.25, 5.0.60, 5.5 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | qc |
[20 May 2008 13:19]
jocelyn fournier
[17 Jun 2008 5:24]
Valeriy Kravchuk
Thank you for a bug report. Verified just as described with 5.1.25: C:\Program Files\MySQL\MySQL Server 5.0\bin>mysql -uroot -proot -P3310 test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 2 Server version: 5.1.25-rc-community MySQL Community Server (GPL) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> DROP TABLE IF EXISTS a; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> mysql> CREATE TABLE `a` ( -> `id1` int(10) unsigned NOT NULL auto_increment, -> `id2` tinyint(3) unsigned NOT NULL default '0', -> `id3` tinyint(3) unsigned NOT NULL default '0', -> `id4` int(10) unsigned NOT NULL default '0', -> `date` timestamp NOT NULL default CURRENT_TIMESTAMP, -> PRIMARY KEY (`id1`), -> KEY `id2` (`id2`,`id3`,`id4`,`date`), -> KEY `id2_2` (`id2`,`id3`,`date`) -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.06 sec) mysql> mysql> INSERT INTO a (id2,id3,id4) VALUES -> (1,1,1),(1,1,1),(1,1,1),(1,1,1),(1,0,1),(1,2,1),(1,3,1); Query OK, 7 rows affected (0.06 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 ORDER BY date DESC LIMIT 0,4; +----+-------------+-------+------+---------------+------+---------+------------ -+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-----------------------------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2 | 2 | const,const | 3 | Using where; Using filesort | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-----------------------------+ 1 row in set (0.05 sec) mysql> DROP TABLE IF EXISTS a; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE `a` ( -> `id1` int(10) unsigned NOT NULL auto_increment, -> `id2` tinyint(3) unsigned NOT NULL default '0', -> `id3` tinyint(3) unsigned NOT NULL default '0', -> `id4` int(10) unsigned NOT NULL default '0', -> `date` timestamp NOT NULL default CURRENT_TIMESTAMP, -> PRIMARY KEY (`id1`), -> KEY `id2` (`id2`,`id3`,`date`), -> KEY `id2_2` (`id2`,`id3`,`id4`,`date`) -> ) ENGINE=MyISAM DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.06 sec) mysql> mysql> INSERT INTO a (id2,id3,id4) VALUES -> (1,1,1),(1,1,1),(1,1,1),(1,1,1),(1,0,1),(1,2,1),(1,3,1); Query OK, 7 rows affected (0.00 sec) Records: 7 Duplicates: 0 Warnings: 0 mysql> EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 ORDER BY date DESC LIMIT 0,4; +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2 | 2 | const,const | 3 | Using where | +----+-------------+-------+------+---------------+------+---------+------------ -+------+-------------+ 1 row in set (0.00 sec)
[17 Jun 2008 5:26]
Valeriy Kravchuk
Same problem verified with 5.0.60.
[23 Jun 2008 23:49]
Omer Barnir
triage: need e/r values (before setting P3 to determine target
[12 Dec 2008 14:46]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61512 2744 Georgi Kodinov 2008-12-12 Bug #36817: Non optimal index choice, depending on index creation order When selecting among the available indexes for REF access the optimizer can take into account that certain indexes may require sorting for ORDER BY and some may not. This can be done by putting extra weight to the indexes that would require filesort. Note that this extra weight should not be added when comparing with the other access methods (e.g. quick access) because it's considered at a later stage anyway.
[5 Jul 2009 12:24]
jocelyn fournier
Hi, Actually that doesn't always concern file sorting / ORDER BY clause (which is quite annoying) : With 5.1.36 : Case 1: DROP TABLE IF EXISTS a; CREATE TABLE `a` ( `id1` int(10) unsigned NOT NULL auto_increment, `id2` tinyint(3) unsigned NOT NULL default '0', `id3` tinyint(3) unsigned NOT NULL default '0', `id4` int(10) unsigned NOT NULL default '0', `date` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id1`), KEY `id2` (`id2`,`id3`,`id4`,`date`), KEY `id2_2` (`id2`,`id3`,`date`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO a (id2,id3,id4) VALUES (1,1,1),(1,1,1),(1,1,1),(1,1,1),(1,0,1),(1,2,1),(1,3,1); EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 AND id4=1; +----+-------------+-------+------+---------------+------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------------+------+-------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2 | 6 | const,const,const | 3 | | +----+-------------+-------+------+---------------+------+---------+-------------------+------+-------+ => id2 is chosen here, which is ok Now let's swap the index declaration order : Case 2: DROP TABLE IF EXISTS a; CREATE TABLE `a` ( `id1` int(10) unsigned NOT NULL auto_increment, `id2` tinyint(3) unsigned NOT NULL default '0', `id3` tinyint(3) unsigned NOT NULL default '0', `id4` int(10) unsigned NOT NULL default '0', `date` timestamp NOT NULL default CURRENT_TIMESTAMP, PRIMARY KEY (`id1`), KEY `id2` (`id2`,`id3`,`date`), KEY `id2_2` (`id2`,`id3`,`id4`,`date`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; INSERT INTO a (id2,id3,id4) VALUES (1,1,1),(1,1,1),(1,1,1),(1,1,1),(1,0,1),(1,2,1),(1,3,1); EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 AND id4=1; +----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2 | 2 | const,const | 3 | Using where | +----+-------------+-------+------+---------------+------+---------+-------------+------+-------------+ Here id2 is selected, which is not the optimal index. Result with 5.0.75, the behaviour is swapped : Case 1 : EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 AND id4=1; +----+-------------+-------+------+---------------+-------+---------+-------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------------+------+-------------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2_2 | 2 | const,const | 3 | Using where | +----+-------------+-------+------+---------------+-------+---------+-------------+------+-------------+ => here id2_2 is used (the wrong one) Case 2 : EXPLAIN SELECT id1 FROM a WHERE id2=1 AND id3=1 AND id4=1; +----+-------------+-------+------+---------------+-------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------------------+------+-------+ | 1 | SIMPLE | a | ref | id2,id2_2 | id2_2 | 6 | const,const,const | 3 | | +----+-------------+-------+------+---------------+-------+---------+-------------------+------+-------+ => Ok Any plan to fix this a little bit earlier than 6.0 ? To fix this issue, I have to heavily hack my application adding USE INDEX, which is not really acceptable for such "standard" queries. Thanks and regards,
[23 Dec 2010 14:34]
Valeriy Kravchuk
Bug #59131 was marked as a duplicate of this one.
[23 Dec 2010 14:49]
Yoshinori Matsunobu
I did some research on this issue and (wrongly) submitted bug#59131. Here are copy from it. create table a (id int auto_increment primary key, k1 int, k2 int, k3 int, value int, index i1 (k2, k1, k3), index i2 (k2, k3)) engine=innodb; Then insert lots of rows with low enough cardinality on k2 (I tested inserting 2.6 mil rows, cardinality on k2 was 1000): insert into a (k1, k2, k3, value)values (1,1,1,1), (2,2,2,2),(3,3,3,3),(4,4,4,4),(5,5,5,5),(6,6,6,6),(7,7,7,7),(8,8,8,8),(9,9,9,9),(10,10,10,10); insert into a (k1, k2, k3, value) select rand()*1000000, rand()*1000, rand()*1000000, value from a; insert into a (k1, k2, k3, value) select rand()*1000000, rand()*1000, rand()*1000000, value from a; insert into a (k1, k2, k3, value) select rand()*1000000, rand()*1000, rand()*1000000, value from a; .... insert into a (k1, k2, k3, value) select rand()*1000000, rand()*1000, rand()*1000000, value from a; Query OK, 1310720 rows affected (51.06 sec) Records: 1310720 Duplicates: 0 Warnings: 0 mysql> explain select * from a where k2=5 order by k3 limit 2; +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | a | range | i1,i2 | i2 | 5 | NULL | 2647 | Using where | +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec) When I checked with a debugger, the following lines where executed to check whether i2 should be used or not. sql/sql_select.cc test_if_cheaper_ordering(): (test_if_skip_sort_order() in 5.1) if (keys.is_set(nr) && (direction= test_if_order_by_key(order, table, nr, &used_key_parts))) ... /* We assume that each of the tested indexes is not correlated with ref_key. Thus, to select first N records we have to scan N/selectivity(ref_key) index entries. selectivity(ref_key) = #scanned_records/#table_records = table->quick_condition_rows/table_records. In any case we can't select more than #table_records. N/(table->quick_condition_rows/table_records) > table_records <=> N > table->quick_condition_rows. */ if (select_limit > table->quick_condition_rows) select_limit= table_records; else select_limit= (ha_rows) (select_limit * (double) table_records / table->quick_condition_rows); .... index_scan_time= select_limit/rec_per_key * min(rec_per_key, table->file->scan_time()); if ((ref_key < 0 && is_covering) || (ref_key < 0 && (group || table->force_index)) || index_scan_time < read_time) ---- table_records: close to the number of rows (2.6M) table->quick_condition_rows: estimated matched rows (2.6M / 1000 = 2.6K) select_limit: originally the same as LIMIT N (20), finally select_limit * table_records / table->quick_condition_rows (20 * 2.6M / 2.6K = 20K) index_scan_time: the same as the final select_limit value read_time: (probably) the same as table->quick_condition_rows (2.6K) read_time was the same as the estimated cost when using index i1. The MySQL optimizer decided that the cost of using i2 (index_scan_time: 20k) was much more expensive than the cost of using i1 (2.6k) so it chose i1. But apparently this is wrong decision. I am not sure why select_limit(index_scan_time) was computed by multiplying table_records. Since it was index scan and sorting could be skipped, select_limit should be at least less than table->quick_condition_rows, shouldn't it?