Bug #1232 Optimizer chooses wrong plan on simple join
Submitted: 9 Sep 2003 3:35 Modified: 13 Dec 2003 13:51
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.1 (20030830) OS:
Assigned to: Igor Babaev CPU Architecture:Any

[9 Sep 2003 3:35] Jeremy Cole
Description:
The optimizer chooses a wrong execution plan after adding one more table to a three table join.

Essentially, the tables store email.  My query tries to find the most recent emails in the tables.  I have four tables, folder, mail, mxa, and address.  They are as follows:

folder:
  folder_id
  f_name
  etc.

mail:
  mail_id
  folder_id
  h_subject
  etc.

mxa: (mail x address)
  mail_id
  address_id
  type (from, to, cc)

address:
  address_id
  phrase
  address
  etc.

It's a simple query to select the "most recent" email regardless of folder:

>>>>>
mysql> select sql_no_cache 
    ->   mail.mail_id, 
    ->   mail.folder_id
    -> from mail
    -> left join mxa on mail.mail_id=mxa.mail_id and mxa.type='from' 
    -> left join address on mxa.address_id=address.address_id 
    -> order by mail.h_date desc 
    -> limit 5;
+---------+-----------+
| mail_id | folder_id |
+---------+-----------+
|   47001 |         1 |
|   46999 |         1 |
|   46965 |         1 |
|   46994 |         1 |
|   46991 |         1 |
+---------+-----------+
5 rows in set (0.01 sec)
<<<<<

No time at all, and the explain comes out sane:

+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
| id | select_type | table   | type   | possible_keys | key     | key_len | ref                     | rows  | Extra       |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+
|  1 | SIMPLE      | mail    | index  | NULL          | h_date  |       8 | NULL                    | 47001 |             |
|  1 | SIMPLE      | mxa     | ref    | mail_id,type  | mail_id |       6 | mail.mail.mail_id,const |     1 |             |
|  1 | SIMPLE      | address | eq_ref | PRIMARY       | PRIMARY |       4 | mail.mxa.address_id     |     1 | Using index |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+

How to repeat:
Adding one more table to the previous join... to get instead, folder names...

>>>>>
mysql> select sql_no_cache 
    ->   mail.mail_id, 
    ->   folder.f_name
    -> from mail
    -> left join folder on mail.folder_id=folder.folder_id
    -> left join mxa on mail.mail_id=mxa.mail_id and mxa.type='from' 
    -> left join address on mxa.address_id=address.address_id 
    -> order by mail.h_date desc 
    -> limit 5;
+---------+---------------+
| mail_id | f_name        |
+---------+---------------+
|   47001 | mysql-general |
|   46999 | mysql-general |
|   46965 | mysql-general |
|   46994 | mysql-general |
|   46991 | mysql-general |
+---------+---------------+
5 rows in set (3.78 sec)
<<<<<

That's very bad!  And the explain doesn't give much of a clue why... do note that the 'folder' table is only 1 row!  Type "system" is possible...

+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+
| id | select_type | table   | type   | possible_keys | key     | key_len | ref                     | rows  | Extra                           |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+
|  1 | SIMPLE      | mail    | ALL    | NULL          | NULL    |    NULL | NULL                    | 47001 | Using temporary; Using filesort |
|  1 | SIMPLE      | folder  | ALL    | PRIMARY       | NULL    |    NULL | NULL                    |     1 |                                 |
|  1 | SIMPLE      | mxa     | ref    | mail_id,type  | mail_id |       6 | mail.mail.mail_id,const |     1 |                                 |
|  1 | SIMPLE      | address | eq_ref | PRIMARY       | PRIMARY |       4 | mail.mxa.address_id     |     1 | Using index                     |
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+---------------------------------+

So you see it's chosen not to follow the index in DESC order and instead to do a filesort with a temporary table...

Suggested fix:
The optimiser should be able to handle this simple case!
[17 Sep 2003 0:29] Jeremy Cole
You can download a mysqldump of the 'mail' database at:

http://jcole.us/files/mail.sql.gz
[13 Dec 2003 13:51] Igor Babaev
With the latest version of 4.1 (as of 2003.12.13) I could not reproduce it  
with the database Jeremy provided me with. 
I had to modify the type descriptors of columns address and comment of the 
address table to 
COLLATE latin1_general_cs NOT NULL 
to make the fulltext index idx_ft_all(phrase, address, comment) valid. 
After this I had: 
 
mysql> explain 
    -> select sql_no_cache  mail.mail_id,  mail.folder_id 
    ->   from mail left join mxa on mail.mail_id=mxa.mail_id and 
mxa.type='from' 
    ->             left join address on mxa.address_id=address.address_id 
    ->     order by mail.h_date desc 
    ->       limit 5; 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
| id | select_type | table   | type   | possible_keys | key     | key_len | 
ref                     | rows  | Extra       | 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
|  1 | SIMPLE      | mail    | index  | NULL          | h_date  |       8 | 
NULL                    | 30800 |             | 
|  1 | SIMPLE      | mxa     | ref    | mail_id,type  | mail_id |       6 | 
mail.mail.mail_id,const |     1 |             | 
|  1 | SIMPLE      | address | eq_ref | PRIMARY       | PRIMARY |       4 | 
mail.mxa.address_id     |     1 | Using index | 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
3 rows in set (0.00 sec) 
 
mysql> explain 
    -> select sql_no_cache mail.mail_id, folder.f_name 
    ->   from mail left join folder on mail.folder_id=folder.folder_id 
    ->             left join mxa on mail.mail_id=mxa.mail_id and 
mxa.type='from' 
    ->             left join address on mxa.address_id=address.address_id 
    ->     order by mail.h_date desc 
    ->       limit 5; 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
| id | select_type | table   | type   | possible_keys | key     | key_len | 
ref                     | rows  | Extra       | 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
|  1 | SIMPLE      | mail    | index  | NULL          | h_date  |       8 | 
NULL                    | 30800 |             | 
|  1 | SIMPLE      | folder  | eq_ref | PRIMARY       | PRIMARY |       4 | 
mail.mail.folder_id     |     1 |             | 
|  1 | SIMPLE      | mxa     | ref    | mail_id,type  | mail_id |       6 | 
mail.mail.mail_id,const |     1 |             | 
|  1 | SIMPLE      | address | eq_ref | PRIMARY       | PRIMARY |       4 | 
mail.mxa.address_id     |     1 | Using index | 
+----+-------------+---------+--------+---------------+---------+---------+-------------------------+-------+-------------+ 
4 rows in set (0.00 sec) 
 
mysql> select sql_no_cache  mail.mail_id,  mail.folder_id 
    ->   from mail left join mxa on mail.mail_id=mxa.mail_id and 
mxa.type='from' 
    ->             left join address on mxa.address_id=address.address_id 
    ->     order by mail.h_date desc 
    ->       limit 5; 
+---------+-----------+ 
| mail_id | folder_id | 
+---------+-----------+ 
|     739 |         1 | 
|     738 |         1 | 
|     737 |         1 | 
|     736 |         1 | 
|     735 |         1 | 
+---------+-----------+ 
5 rows in set (0.00 sec) 
 
mysql> select sql_no_cache mail.mail_id, folder.f_name 
    ->   from mail left join folder on mail.folder_id=folder.folder_id 
    ->             left join mxa on mail.mail_id=mxa.mail_id and 
mxa.type='from' 
    ->             left join address on mxa.address_id=address.address_id 
    ->     order by mail.h_date desc 
    ->       limit 5; 
+---------+--------+ 
| mail_id | f_name | 
+---------+--------+ 
|     739 | cllug  | 
|     738 | cllug  | 
|     737 | cllug  | 
|     736 | cllug  | 
|     735 | cllug  | 
+---------+--------+ 
5 rows in set (0.01 sec)