MySQL Bugs: #2022: Query Optimizer occaisionally uses wrong plan on the same query

Bug #2022	Query Optimizer occaisionally uses wrong plan on the same query
Submitted:	5 Dec 2003 9:00	Modified:	10 Dec 2003 18:26
Reporter:	David Chazin	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	4.0.11 gamma	OS:	Windows (Windows 2000)
Assigned to:	Igor Babaev	CPU Architecture:	Any

Description:
I have the following tables:

create TABLE peptide_trypsin (
        peptide_id INTEGER(10) unsigned not null,
	molecular_weight INTEGER(10) unsigned not null,
	sequence TEXT not null,
	missed_cleavages SMALLINT unsigned not null)

alter TABLE peptide_trypsin ADD INDEX id_index (peptide_id), ADD INDEX mw_index (molecular_weight)

create TABLE pep_to_prot (
	peptide_id INTEGER(10) unsigned not null,
	protein_id INTEGER(10) unsigned not null,
	PRIMARY KEY(peptide_id, protein_id))

peptide_trypsin has about 13,000,000 rows. pep_to_prot has about 24,000,000 rows.

when I execute the following query:

select pep_to_prot.protein_id  from peptide_trypsin, pep_to_prot
where molecular_weight > X and molecular_weight < Y
and peptide_trypsin.peptide_id = pep_to_prot.peptide_id

The optimizer USUALLY chooses to use the mw_index on peptide_trypsin first and then join the results using the primary key on pep_to_prot. This is correct.

However, when the mw_index will return more than about 700 rows, it elects to do a full cartesian product on the 2 tables! using the primary key of pep_to_prot followed by the id_index on peptide_trypsin. Needless to say this is a bad thing.

How to repeat:
contact me and I can arrange to provide the data (although it is about 1GB)

There have been optimizer changes/bugfixes since 4.0.11 (which is from February).  Would you be able to test this against a current release and see if the problem persists?

I tested it with 4.1.1-alpha and it has the same behavior.

Verified in 4.0.17/Linux.  Populated peptide_trypsin with 16 million rows, with unique peptide_id (same value for molecular weight).  Populated pep_to_prot with 32 million rows, two rows for each peptide_id.  Other columns were populated with random data.

mysql> explain select pep_to_prot.protein_id  from peptide_trypsin, pep_to_prot where molecular_weight between 1 and 500 and peptide_trypsin.peptide_id = pep_to_prot.peptide_id;
+-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+
| table           | type  | possible_keys     | key      | key_len | ref                        | rows   | Extra       |
+-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+
| peptide_trypsin | range | id_index,mw_index | mw_index |       4 | NULL                       |    484 | Using where |
| pep_to_prot     | ref   | PRIMARY           | PRIMARY  |       4 | peptide_trypsin.peptide_id | 259999 | Using index |
+-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+
2 rows in set (0.00 sec)
 
mysql> explain select pep_to_prot.protein_id  from peptide_trypsin, pep_to_prot where molecular_weight between 1 and 600 and peptide_trypsin.peptide_id = pep_to_prot.peptide_id;
+-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+
| table           | type  | possible_keys     | key      | key_len | ref                    | rows     | Extra       |
+-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+
| pep_to_prot     | index | PRIMARY           | PRIMARY  |       8 | NULL                   | 25999998 | Using index |
| peptide_trypsin | ref   | id_index,mw_index | id_index |       4 | pep_to_prot.peptide_id |        1 | Using where |
+-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+
2 rows in set (0.00 sec)

With my particular set of data, the behavior changes at: molecular_weight between 1 and 553.

Thank you.

It's not a bug. 
After rinning the analyze command collecting statistics on the index of the 
table pep_to_prot we have: 
 
mysql> explain 
    -> select pep_to_prot.protein_id  from peptide_trypsin, pep_to_prot 
    ->   where molecular_weight between 1 and 500 and 
    ->         peptide_trypsin.peptide_id = pep_to_prot.peptide_id; 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
| id | select_type | table           | type  | possible_keys     | key      | 
key_len | ref                             | rows | Extra       | 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
|  1 | SIMPLE      | peptide_trypsin | range | id_index,mw_index | mw_index |       
4 | NULL                            |  496 | Using where | 
|  1 | SIMPLE      | pep_to_prot     | ref   | PRIMARY           | PRIMARY  |       
4 | test.peptide_trypsin.peptide_id |    2 | Using index | 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
2 rows in set (4.71 sec) 
 
and 
 
mysql> explain 
    -> select pep_to_prot.protein_id  from peptide_trypsin, pep_to_prot 
    ->   where molecular_weight between 1 and 600 and 
    ->     peptide_trypsin.peptide_id = pep_to_prot.peptide_id; 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
| id | select_type | table           | type  | possible_keys     | key      | 
key_len | ref                             | rows | Extra       | 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
|  1 | SIMPLE      | peptide_trypsin | range | id_index,mw_index | mw_index |       
4 | NULL                            |  595 | Using where | 
|  1 | SIMPLE      | pep_to_prot     | ref   | PRIMARY           | PRIMARY  |       
4 | test.peptide_trypsin.peptide_id |    2 | Using index | 
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+ 
2 rows in set (0.06 sec) 
 
Please, always run the analyze command when you have strange execution plans 
for queries, especially for those executing against big tables.