| Bug #1232 | Optimizer chooses wrong plan on simple join | ||
|---|---|---|---|
| Submitted: | 9 Sep 2003 3:35 | Modified: | 13 Dec 2003 13:51 |
| Reporter: | Jeremy Cole (Basic Quality Contributor) (OCA) | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server | Severity: | S3 (Non-critical) |
| Version: | 4.1.1 (20030830) | OS: | |
| Assigned to: | Igor Babaev | CPU Architecture: | Any |
[17 Sep 2003 0:29]
Jeremy Cole
You can download a mysqldump of the 'mail' database at: http://jcole.us/files/mail.sql.gz
[13 Dec 2003 13:51]
Igor Babaev
With the latest version of 4.1 (as of 2003.12.13) I could not reproduce it
with the database Jeremy provided me with.
I had to modify the type descriptors of columns address and comment of the
address table to
COLLATE latin1_general_cs NOT NULL
to make the fulltext index idx_ft_all(phrase, address, comment) valid.
After this I had:
mysql> explain
-> select sql_no_cache mail.mail_id, mail.folder_id
-> from mail left join mxa on mail.mail_id=mxa.mail_id and
mxa.type='from'
-> left join address on mxa.address_id=address.address_id
-> order by mail.h_date desc
-> limit 5;
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
| 1 | SIMPLE | mail | index | NULL | h_date | 8 |
NULL | 30800 | |
| 1 | SIMPLE | mxa | ref | mail_id,type | mail_id | 6 |
mail.mail.mail_id,const | 1 | |
| 1 | SIMPLE | address | eq_ref | PRIMARY | PRIMARY | 4 |
mail.mxa.address_id | 1 | Using index |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
3 rows in set (0.00 sec)
mysql> explain
-> select sql_no_cache mail.mail_id, folder.f_name
-> from mail left join folder on mail.folder_id=folder.folder_id
-> left join mxa on mail.mail_id=mxa.mail_id and
mxa.type='from'
-> left join address on mxa.address_id=address.address_id
-> order by mail.h_date desc
-> limit 5;
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len |
ref | rows | Extra |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
| 1 | SIMPLE | mail | index | NULL | h_date | 8 |
NULL | 30800 | |
| 1 | SIMPLE | folder | eq_ref | PRIMARY | PRIMARY | 4 |
mail.mail.folder_id | 1 | |
| 1 | SIMPLE | mxa | ref | mail_id,type | mail_id | 6 |
mail.mail.mail_id,const | 1 | |
| 1 | SIMPLE | address | eq_ref | PRIMARY | PRIMARY | 4 |
mail.mxa.address_id | 1 | Using index |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
4 rows in set (0.00 sec)
mysql> select sql_no_cache mail.mail_id, mail.folder_id
-> from mail left join mxa on mail.mail_id=mxa.mail_id and
mxa.type='from'
-> left join address on mxa.address_id=address.address_id
-> order by mail.h_date desc
-> limit 5;
+---------+-----------+
| mail_id | folder_id |
+---------+-----------+
| 739 | 1 |
| 738 | 1 |
| 737 | 1 |
| 736 | 1 |
| 735 | 1 |
+---------+-----------+
5 rows in set (0.00 sec)
mysql> select sql_no_cache mail.mail_id, folder.f_name
-> from mail left join folder on mail.folder_id=folder.folder_id
-> left join mxa on mail.mail_id=mxa.mail_id and
mxa.type='from'
-> left join address on mxa.address_id=address.address_id
-> order by mail.h_date desc
-> limit 5;
+---------+--------+
| mail_id | f_name |
+---------+--------+
| 739 | cllug |
| 738 | cllug |
| 737 | cllug |
| 736 | cllug |
| 735 | cllug |
+---------+--------+
5 rows in set (0.01 sec)

Description: The optimizer chooses a wrong execution plan after adding one more table to a three table join. Essentially, the tables store email. My query tries to find the most recent emails in the tables. I have four tables, folder, mail, mxa, and address. They are as follows: folder: folder_id f_name etc. mail: mail_id folder_id h_subject etc. mxa: (mail x address) mail_id address_id type (from, to, cc) address: address_id phrase address etc. It's a simple query to select the "most recent" email regardless of folder: >>>>> mysql> select sql_no_cache -> mail.mail_id, -> mail.folder_id -> from mail -> left join mxa on mail.mail_id=mxa.mail_id and mxa.type='from' -> left join address on mxa.address_id=address.address_id -> order by mail.h_date desc -> limit 5; +---------+-----------+ | mail_id | folder_id | +---------+-----------+ | 47001 | 1 | | 46999 | 1 | | 46965 | 1 | | 46994 | 1 | | 46991 | 1 | +---------+-----------+ 5 rows in set (0.01 sec) <<<<< No time at all, and the explain comes out sane: +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ | 1 | SIMPLE | mail | index | NULL | h_date | 8 | NULL | 47001 | | | 1 | SIMPLE | mxa | ref | mail_id,type | mail_id | 6 | mail.mail.mail_id,const | 1 | | | 1 | SIMPLE | address | eq_ref | PRIMARY | PRIMARY | 4 | mail.mxa.address_id | 1 | Using index | +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ How to repeat: Adding one more table to the previous join... to get instead, folder names... >>>>> mysql> select sql_no_cache -> mail.mail_id, -> folder.f_name -> from mail -> left join folder on mail.folder_id=folder.folder_id -> left join mxa on mail.mail_id=mxa.mail_id and mxa.type='from' -> left join address on mxa.address_id=address.address_id -> order by mail.h_date desc -> limit 5; +---------+---------------+ | mail_id | f_name | +---------+---------------+ | 47001 | mysql-general | | 46999 | mysql-general | | 46965 | mysql-general | | 46994 | mysql-general | | 46991 | mysql-general | +---------+---------------+ 5 rows in set (3.78 sec) <<<<< That's very bad! And the explain doesn't give much of a clue why... do note that the 'folder' table is only 1 row! Type "system" is possible... +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+ | 1 | SIMPLE | mail | ALL | NULL | NULL | NULL | NULL | 47001 | Using temporary; Using filesort | | 1 | SIMPLE | folder | ALL | PRIMARY | NULL | NULL | NULL | 1 | | | 1 | SIMPLE | mxa | ref | mail_id,type | mail_id | 6 | mail.mail.mail_id,const | 1 | | | 1 | SIMPLE | address | eq_ref | PRIMARY | PRIMARY | 4 | mail.mxa.address_id | 1 | Using index | +----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+ So you see it's chosen not to follow the index in DESC order and instead to do a filesort with a temporary table... Suggested fix: The optimiser should be able to handle this simple case!