MySQL Bugs: #7080: Optimizer choose incorrect index

Bug #7080	Optimizer choose incorrect index
Submitted:	7 Dec 2004 15:23	Modified:	29 Jan 2005 6:37
Reporter:	Victoria Reznichenko	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	4.0	OS:	Linux (Linux)
Assigned to:		CPU Architecture:	Any
Tags:	cost model

Description:
MySQL choose incorrect index for the query.
EXPLAIN SELECTs and CREATE TABLE statement are mentioned in the support issue #4131

How to repeat:
See above

This is the output og the first query:
mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1
    -> from table1 where ((table1.col1 = 15992) and table1.col3 in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974,3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001,4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by table1.col4;
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
| table  | type  | possible_keys                    | key     | key_len | ref  | rows | Extra                                        |
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
| table1 | range | PRIMARY,ind0,ind1,ind2,ind3,ind4 | PRIMARY |      10 | NULL |   54 | Using where; Using temporary; Using filesort |
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
1 row in set (0.01 sec)

PRIMARY key is used.

This query doesn't use PRIMARY, it uses ind3 key.

mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1
    -> from table1 where ((table1.col1 in (15992, 15991)) and table1.col3 in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974,3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001,4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by table1.col4;
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
| table  | type | possible_keys                    | key  | key_len | ref   | rows | Extra                                        |
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
| table1 | ref  | PRIMARY,ind0,ind1,ind2,ind3,ind4 | ind3 |       2 | const |    1 | Using where; Using temporary; Using filesort |
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
1 row in set (0.02 sec)

script to populate the table

Attachment: insert.pl (application/octet-stream, text), 395 bytes.

table1.sql has CREATE TABLE statement.
insert.pl - perl script to populate table with test data.

After having run the command 
ANALYZE table table1 
and getting statistics on indexes, 
the optimizer returned the following plan: 
 
mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1 
    -> from table1 where ((table1.col1 in (15992, 15991)) and table1.col3 
    -> 
in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974, 
    -> 
3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001, 
    -> 
4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021, 
    -> 4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by 
table1.col4; 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
| table  | type | possible_keys                    | key  | key_len | ref   | 
rows | Extra                       | 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
| table1 | ref  | PRIMARY,ind0,ind1,ind2,ind3,ind4 | ind2 |       2 | const |  
382 | Using where; Using filesort | 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
 
The execution by this plan took me 0.09 sec. In fact it's worse than the 
execution by the plan using the primary index. 
So the choice of the optimizer is not the best, but it happens only because 
it's very hard to take into account the impact of using OS input/output 
caches. Currently the optimizer just ignores them. The the future virsions we 
are going  to employ a more subtle cost model for the optimizer.

The reported choice of the optimizer is due to some inconsistencies of the cost model currently employed by it. This model makes him prefer access by index ind2 to multiple lookups by primary keys.
We are going to introduce the new cost model in version 5.1.