Bug #7080 Optimizer choose incorrect index
Submitted: 7 Dec 2004 15:23 Modified: 29 Jan 2005 6:37
Reporter: Victoria Reznichenko Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.0 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: cost model
Triage: D3 (Medium)

[7 Dec 2004 15:23] Victoria Reznichenko
Description:
MySQL choose incorrect index for the query.
EXPLAIN SELECTs and CREATE TABLE statement are mentioned in the support issue #4131

How to repeat:
See above
[14 Dec 2004 17:15] Victoria Reznichenko
This is the output og the first query:
mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1
    -> from table1 where ((table1.col1 = 15992) and table1.col3 in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974,3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001,4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by table1.col4;
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
| table  | type  | possible_keys                    | key     | key_len | ref  | rows | Extra                                        |
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
| table1 | range | PRIMARY,ind0,ind1,ind2,ind3,ind4 | PRIMARY |      10 | NULL |   54 | Using where; Using temporary; Using filesort |
+--------+-------+----------------------------------+---------+---------+------+------+----------------------------------------------+
1 row in set (0.01 sec)

PRIMARY key is used.

This query doesn't use PRIMARY, it uses ind3 key.

mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1
    -> from table1 where ((table1.col1 in (15992, 15991)) and table1.col3 in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974,3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001,4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021,4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by table1.col4;
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
| table  | type | possible_keys                    | key  | key_len | ref   | rows | Extra                                        |
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
| table1 | ref  | PRIMARY,ind0,ind1,ind2,ind3,ind4 | ind3 |       2 | const |    1 | Using where; Using temporary; Using filesort |
+--------+------+----------------------------------+------+---------+-------+------+----------------------------------------------+
1 row in set (0.02 sec)
[14 Dec 2004 17:22] Victoria Reznichenko
script to populate the table

Attachment: insert.pl (application/octet-stream, text), 395 bytes.

[14 Dec 2004 17:23] Victoria Reznichenko
table1.sql has CREATE TABLE statement.
insert.pl - perl script to populate table with test data.
[26 Dec 2004 8:53] Igor Babaev
After having run the command 
ANALYZE table table1 
and getting statistics on indexes, 
the optimizer returned the following plan: 
 
mysql> explain select distinct table1.col4,table1.col3,table1.col5,table1.col1 
    -> from table1 where ((table1.col1 in (15992, 15991)) and table1.col3 
    -> 
in(3951,3953,3956,3957,3958,3959,3960,3961,3962,3964,3965,3966,3967,3971,3974, 
    -> 
3975,3977,3979,3981,3984,3987,3988,3990,3991,3992,3993,3994,3997,3999,4000,4001, 
    -> 
4004,4005,4007,4009,4010,4011,4012,4013,4014,4015,4016,4017,4018,4019,4020,4021, 
    -> 4022,4023,4024,4025,4026,4027,4028)) and table1.col2 = 254 order by 
table1.col4; 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
| table  | type | possible_keys                    | key  | key_len | ref   | 
rows | Extra                       | 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
| table1 | ref  | PRIMARY,ind0,ind1,ind2,ind3,ind4 | ind2 |       2 | const |  
382 | Using where; Using filesort | 
+--------+------+----------------------------------+------+---------+-------+------+-----------------------------+ 
 
The execution by this plan took me 0.09 sec. In fact it's worse than the 
execution by the plan using the primary index. 
So the choice of the optimizer is not the best, but it happens only because 
it's very hard to take into account the impact of using OS input/output 
caches. Currently the optimizer just ignores them. The the future virsions we 
are going  to employ a more subtle cost model for the optimizer.
[29 Jan 2005 6:37] Igor Babaev
The reported choice of the optimizer is due to some inconsistencies of the cost model currently employed by it. This model makes him prefer access by index ind2 to multiple lookups by primary keys.
We are going to introduce the new cost model in version 5.1.