| Bug #2022 | Query Optimizer occaisionally uses wrong plan on the same query | ||
|---|---|---|---|
| Submitted: | 5 Dec 2003 9:00 | Modified: | 10 Dec 2003 18:26 |
| Reporter: | David Chazin | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server | Severity: | S2 (Serious) |
| Version: | 4.0.11 gamma | OS: | Windows (Windows 2000) |
| Assigned to: | Igor Babaev | CPU Architecture: | Any |
[5 Dec 2003 10:17]
Dean Ellis
There have been optimizer changes/bugfixes since 4.0.11 (which is from February). Would you be able to test this against a current release and see if the problem persists?
[5 Dec 2003 11:59]
David Chazin
I tested it with 4.1.1-alpha and it has the same behavior.
[5 Dec 2003 14:30]
Dean Ellis
Verified in 4.0.17/Linux. Populated peptide_trypsin with 16 million rows, with unique peptide_id (same value for molecular weight). Populated pep_to_prot with 32 million rows, two rows for each peptide_id. Other columns were populated with random data. mysql> explain select pep_to_prot.protein_id from peptide_trypsin, pep_to_prot where molecular_weight between 1 and 500 and peptide_trypsin.peptide_id = pep_to_prot.peptide_id; +-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+ | peptide_trypsin | range | id_index,mw_index | mw_index | 4 | NULL | 484 | Using where | | pep_to_prot | ref | PRIMARY | PRIMARY | 4 | peptide_trypsin.peptide_id | 259999 | Using index | +-----------------+-------+-------------------+----------+---------+----------------------------+--------+-------------+ 2 rows in set (0.00 sec) mysql> explain select pep_to_prot.protein_id from peptide_trypsin, pep_to_prot where molecular_weight between 1 and 600 and peptide_trypsin.peptide_id = pep_to_prot.peptide_id; +-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+ | pep_to_prot | index | PRIMARY | PRIMARY | 8 | NULL | 25999998 | Using index | | peptide_trypsin | ref | id_index,mw_index | id_index | 4 | pep_to_prot.peptide_id | 1 | Using where | +-----------------+-------+-------------------+----------+---------+------------------------+----------+-------------+ 2 rows in set (0.00 sec) With my particular set of data, the behavior changes at: molecular_weight between 1 and 553. Thank you.
[10 Dec 2003 18:26]
Igor Babaev
It's not a bug.
After rinning the analyze command collecting statistics on the index of the
table pep_to_prot we have:
mysql> explain
-> select pep_to_prot.protein_id from peptide_trypsin, pep_to_prot
-> where molecular_weight between 1 and 500 and
-> peptide_trypsin.peptide_id = pep_to_prot.peptide_id;
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
| 1 | SIMPLE | peptide_trypsin | range | id_index,mw_index | mw_index |
4 | NULL | 496 | Using where |
| 1 | SIMPLE | pep_to_prot | ref | PRIMARY | PRIMARY |
4 | test.peptide_trypsin.peptide_id | 2 | Using index |
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
2 rows in set (4.71 sec)
and
mysql> explain
-> select pep_to_prot.protein_id from peptide_trypsin, pep_to_prot
-> where molecular_weight between 1 and 600 and
-> peptide_trypsin.peptide_id = pep_to_prot.peptide_id;
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
| 1 | SIMPLE | peptide_trypsin | range | id_index,mw_index | mw_index |
4 | NULL | 595 | Using where |
| 1 | SIMPLE | pep_to_prot | ref | PRIMARY | PRIMARY |
4 | test.peptide_trypsin.peptide_id | 2 | Using index |
+----+-------------+-----------------+-------+-------------------+----------+---------+---------------------------------+------+-------------+
2 rows in set (0.06 sec)
Please, always run the analyze command when you have strange execution plans
for queries, especially for those executing against big tables.

Description: I have the following tables: create TABLE peptide_trypsin ( peptide_id INTEGER(10) unsigned not null, molecular_weight INTEGER(10) unsigned not null, sequence TEXT not null, missed_cleavages SMALLINT unsigned not null) alter TABLE peptide_trypsin ADD INDEX id_index (peptide_id), ADD INDEX mw_index (molecular_weight) create TABLE pep_to_prot ( peptide_id INTEGER(10) unsigned not null, protein_id INTEGER(10) unsigned not null, PRIMARY KEY(peptide_id, protein_id)) peptide_trypsin has about 13,000,000 rows. pep_to_prot has about 24,000,000 rows. when I execute the following query: select pep_to_prot.protein_id from peptide_trypsin, pep_to_prot where molecular_weight > X and molecular_weight < Y and peptide_trypsin.peptide_id = pep_to_prot.peptide_id The optimizer USUALLY chooses to use the mw_index on peptide_trypsin first and then join the results using the primary key on pep_to_prot. This is correct. However, when the mw_index will return more than about 700 rows, it elects to do a full cartesian product on the 2 tables! using the primary key of pep_to_prot followed by the id_index on peptide_trypsin. Needless to say this is a bad thing. How to repeat: contact me and I can arrange to provide the data (although it is about 1GB)