Bug #20420 optimizer reports wrong keys on left join with IN
Submitted: 13 Jun 2006 9:43 Modified: 7 Mar 2007 18:12
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.23-BK, 5.0.22, 5.1. OS:Linux (Linux)
Assigned to: Georgi Kodinov CPU Architecture:Any
Tags: Q1

[13 Jun 2006 9:43] [ name withheld ]
Description:
When left joining 2 tables, and using a column from each one in a different table's join , the 3rd table will not use the index (anyway that's what the EXPLAIN shows).

It's hard to explain in words so I created sample tables.

How to repeat:
CREATE TABLE `admin` (
  `admin_id` int(11) unsigned NOT NULL,
  `email` varchar(50) NOT NULL,
  `type_id` int(11) unsigned NOT NULL,
  PRIMARY KEY  (`admin_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `admin` VALUES (1,'noone@nowhere.com',1),(2,'user@server.com',1);

CREATE TABLE `client` (
  `client_id` int(11) unsigned NOT NULL,
  `fname` varchar(50) NOT NULL,
  `type_id` int(10) unsigned NOT NULL,
  PRIMARY KEY  (`client_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `client` VALUES (3,'john',2),(4,'elvis',2);

CREATE TABLE `user` (
  `user_id` int(11) unsigned NOT NULL auto_increment,
  `username` varchar(50) NOT NULL,
  `password` varchar(13) NOT NULL,
  PRIMARY KEY  (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `user` VALUES (1,'admin1','1234'),(2,'admin2','5678'),(3,'client1','1234'),(4,'client2','56789');

CREATE TABLE `user_type` (
  `type_id` int(11) unsigned NOT NULL auto_increment,
  `type_name` varchar(50) NOT NULL,
  PRIMARY KEY  (`type_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT INTO `user_type` VALUES (1,'master'),(2,'client');

The select query:

SELECT * FROM `user` AS `us` LEFT JOIN `admin` AS `ad` ON `us`.`user_id`=`ad`.`admin_id` LEFT JOIN `client` AS `cl` ON `us`.`user_id`=`cl`.`client_id` JOIN `user_type` AS `ut` WHERE `ut`.`type_id` IN (`ad`.`type_id`, `cl`.`type_id`);

Now if you do an EXPLAIN on this query you will see that table `ut` is not using any key in the query, although `type_id` is the primary key.

Is this a bug, or is it supposed to be this way? Is it not supposed to use the index on `ut`.`type_id` anyway?

My real life case is with much bigger tables, and it doesn't seem to effect the speed of the query if I change the IN part to a normal equality on just one of the tables, which suggests to me that it does use the indexes.
[13 Jun 2006 11:50] Valeriy Kravchuk
Thank you for a problem report. I think, it is a bug. Look:

mysql> explain SELECT * FROM `user` AS `us` LEFT JOIN `admin` AS `ad` ON `us`.`user_id`=`ad`.`admin_id` LEFT JOIN `client` AS `cl` ON `us`.`user_id`=`cl`.`client_id` JOIN `user_type` AS `ut` WHERE `ut`.`type_id` IN (`ad`.`type_id`, `cl`.`type_id`);
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+-------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref          | rows | Extra       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+-------------+
|  1 | SIMPLE      | us    | ALL    | NULL          | NULL    | NULL    | NULL          |    4 |             |
|  1 | SIMPLE      | ad    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |             |
|  1 | SIMPLE      | cl    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |             |
|  1 | SIMPLE      | ut    | ALL    | NULL          | NULL    | NULL    | NULL          |    2 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+-------------+
4 rows in set (0.00 sec)

So, index was not even considered. While:

mysql> explain SELECT * FROM `user` AS `us` LEFT JOIN `admin` AS `ad` ON `us`.`user_id`=`ad`.`admin_id` LEFT JOIN `client` AS `cl` ON `us`.`user_id`=`cl`.`client_id` JOIN `user_type` AS `ut` WHERE `ut`.`type_id` = `ad`.`type_id` OR `ut`.`type_id` = `cl`.`type_id`;
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref          | rows | Extra                                          |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
|  1 | SIMPLE      | us    | ALL    | NULL          | NULL    | NULL    | NULL          |    4 |                                                |
|  1 | SIMPLE      | ad    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |                                                |
|  1 | SIMPLE      | cl    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |                                                |
|  1 | SIMPLE      | ut    | ALL    | PRIMARY       | NULL    | NULL    | NULL          |    2 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
4 rows in set (0.00 sec)

And:

mysql> explain SELECT * FROM `user` AS `us` LEFT JOIN `admin` AS `ad` ON `us`.`user_id`=`ad`.`admin_id` LEFT JOIN `client` AS `cl` ON `us`.`user_id`=`cl`.`client_id` JOIN `user_type` AS `ut` ON (`ut`.`type_id` = `ad`.`type_id` OR `ut`.`type_id` = `cl`.`type_id`);
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref          | rows | Extra                                          |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
|  1 | SIMPLE      | us    | ALL    | NULL          | NULL    | NULL    | NULL          |    4 |                                                |
|  1 | SIMPLE      | ad    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |                                                |
|  1 | SIMPLE      | cl    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id |    1 |                                                |
|  1 | SIMPLE      | ut    | ALL    | PRIMARY       | NULL    | NULL    | NULL          |    2 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+------+------------------------------------------------+
4 rows in set (0.00 sec)

In both cases above PRIMARY key is, at least, considred (or even used) by the oprimizer (who knows what that "Range checked for each record" means).

And, surely, with single "=" conition index is used:

mysql> explain SELECT * FROM `user` AS `us` LEFT JOIN `admin` AS `ad` ON `us`.`user_id`=`ad`.`admin_id` LEFT JOIN `client` AS `cl` ON `us`.`user_id`=`cl`.`client_id` JOIN `user_type` AS `ut` WHERE `ut`.`type_id` = `ad`.`type_id`;
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------+
|  1 | SIMPLE      | ad    | ALL    | PRIMARY       | NULL    | NULL    | NULL           |    2 |       |
|  1 | SIMPLE      | us    | eq_ref | PRIMARY       | PRIMARY | 4       | test.ad.admin_id |    1 |       |
|  1 | SIMPLE      | cl    | eq_ref | PRIMARY       | PRIMARY | 4       | test.us.user_id  |    1 |       |
|  1 | SIMPLE      | ut    | eq_ref | PRIMARY       | PRIMARY | 4       | test.ad.type_id  |    1 |       |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+-------+
4 rows in set (0.00 sec)

I think, it is optimizer bug/feature request. Index should be used in such cases.
[10 Jan 2007 10:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17838

ChangeSet@1.2367, 2007-01-10 12:22:53+02:00, gkodinov@macbook.gmz +3 -0
  BUG#20420: optimizer reports wrong keys on left join with IN
   The optimizer needs to evaluate whether predicates are better
   evaluated using an index. IN is one such predicate.
   To qualify an IN predicate must involve a field of the index
   on the left and constant arguments on the right.
   However whether an expression is a constant can be determined only
   by knowing the preceding tables in the join order. 
   Assuming that only IN predicates with expressions on the right that
   are constant for the whole query qualify limits the scope of 
   possible optimizations of the IN predicate (more specifically it
   doesn't allow the "Range checked for each record" optimization for
   such an IN predicate.
   Fixed by not pre-determining the optimizability of the IN predicate
   based on the whole-query-constants only.
[15 Jan 2007 17:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18132

ChangeSet@1.2367, 2007-01-15 19:15:52+02:00, gkodinov@macbook.gmz +3 -0
  BUG#20420: optimizer reports wrong keys on left join with IN
   The optimizer needs to evaluate whether predicates are better
   evaluated using an index. IN is one such predicate.
   To qualify an IN predicate must involve a field of the index
   on the left and constant arguments on the right.
   However whether an expression is a constant can be determined only
   by knowing the preceding tables in the join order. 
   Assuming that only IN predicates with expressions on the right that
   are constant for the whole query qualify limits the scope of 
   possible optimizations of the IN predicate (more specifically it
   doesn't allow the "Range checked for each record" optimization for
   such an IN predicate.
   Fixed by not pre-determining the optimizability of the IN predicate
   in the case when all right IN operands are not SQL constant expressions
[24 Jan 2007 15:20] Georgi Kodinov
Done for 5.0.
Puting back to verified as the fix needs extension for 5.1
[25 Jan 2007 14:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/18777

ChangeSet@1.2396, 2007-01-25 16:52:43+02:00, gkodinov@macbook.gmz +5 -0
  BUG#20420: optimizer reports wrong keys on left join with IN
   When checking if an IN predicate can be evaluated using a key
   the optimizer makes sure that all the arguments of IN are of
   the same result type. To assure that it check whether 
   Item_func_in::array is filled in. 
   However Item_func_in::array is set if the types are
   the same AND all the arguments are compile time constants.
   Fixed by introducing Item_func_in::arg_types_compatible
   flag to allow correct checking of the desired condition.
[25 Jan 2007 15:06] Georgi Kodinov
Assigning Sergey as a reviewer for the 5.1 fix.
[16 Feb 2007 13:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19997

ChangeSet@1.2396, 2007-02-16 15:51:50+02:00, gkodinov@macbook.gmz +5 -0
  BUG#20420: optimizer reports wrong keys on left join with IN
   When checking if an IN predicate can be evaluated using a key
   the optimizer makes sure that all the arguments of IN are of
   the same result type. To assure that it check whether 
   Item_func_in::array is filled in. 
   However Item_func_in::array is set if the types are
   the same AND all the arguments are compile time constants.
   Fixed by introducing Item_func_in::arg_types_compatible
   flag to allow correct checking of the desired condition.
[28 Feb 2007 17:44] Sergei Glukhov
Fixed in 5.0.38, 5.1.17-beta
[7 Mar 2007 18:12] Paul DuBois
Noted in 5.0.38, 5.1.17 changelogs.

The conditions checked by the optimizer to allow use of indexes in IN
predicate calculations were unnecessarily tight and were relaxed.