Bug #52170 innodb query optimizer fails (null possible_keys) with multicolumn key
Submitted: 18 Mar 2010 9:57 Modified: 22 Mar 2010 10:43
Reporter: Ethan Joffe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.44, mysql-next-mt OS:Linux (fc12)
Assigned to: CPU Architecture:Any

[18 Mar 2010 9:57] Ethan Joffe
Description:

If you pass multiple value sets in a select, the optimizer does not use a matching multicolumn key.

Probably easiest just to show... create table as shown in how to repeat

here it is working if you just pass one set...

mysql> explain select * from testdata where (ID, ID2) in ((1,1));
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
| id | select_type | table    | type  | possible_keys | key  | key_len | ref         | rows | Extra |
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
|  1 | SIMPLE      | testdata | const | ID            | ID   | 8       | const,const |    1 |       |
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
1 row in set (0.01 sec)

Here it is working if you pass 2 sets separately...

mysql> explain select * from testdata where (ID, ID2)=(1,1) or (ID, ID2)=(3,3);
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | testdata | range | ID            | ID   | 8       | NULL |    2 | Using where |
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

and here it is failing if you pass the same 2 sets in a list...

mysql> explain select * from testdata where (ID, ID2) in ((1,1), (3,3));
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | testdata | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

How to repeat:

CREATE TABLE testdata (
    ID          int unsigned NOT NULL,
    ID2         int unsigned NOT NULL,
    val         int unsigned NOT NULL,
    UNIQUE (ID, ID2)
) TYPE = InnoDB;
insert into testdata (ID, ID2, val) values (1,1,1), (2,2,2), (3,3,3), (4,4,4);
explain select * from testdata where (ID, ID2) in ((1,1));
explain select * from testdata where (ID, ID2)=(1,1) or (ID, ID2)=(3,3);
explain select * from testdata where (ID, ID2) in ((1,1), (3,3));
[18 Mar 2010 11:55] Valeriy Kravchuk
Thank you for the problem report. Verified just as described:

openxs@ubuntu:/home2/openxs/dbs/next-mr$ bin/mysql --no-defaults -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.99-m4-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE testdata (
    ->     ID          int unsigned NOT NULL,
    ->     ID2         int unsigned NOT NULL,
    ->     val         int unsigned NOT NULL,
    ->     UNIQUE (ID, ID2)
    -> ) engine=InnoDB;
Query OK, 0 rows affected (1.04 sec)

mysql> insert into testdata (ID, ID2, val) values (1,1,1), (2,2,2), (3,3,3), (4,4,4);
Query OK, 4 rows affected (0.06 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> explain select * from testdata where (ID, ID2) in ((1,1));
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
| id | select_type | table    | type  | possible_keys | key  | key_len | ref         | rows | Extra |
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
|  1 | SIMPLE      | testdata | const | ID            | ID   | 8       | const,const |    1 |       |
+----+-------------+----------+-------+---------------+------+---------+-------------+------+-------+
1 row in set (0.13 sec)

mysql> explain select * from testdata where (ID, ID2)=(1,1) or (ID, ID2)=(3,3);
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | testdata | range | ID            | ID   | 8       | NULL |    2 | Using where |
+----+-------------+----------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.18 sec)

mysql> explain select * from testdata where (ID, ID2) in ((1,1), (3,3));
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | testdata | ALL  | NULL          | NULL | NULL    | NULL |    4 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.03 sec)
[22 Mar 2010 10:43] Ethan Joffe
I discovered the unfortunate additional fact that using the "OR" form of the query is not always a performance workaround for large datasets. Apparently the difference lies in type=range vs type=index. Sometimes, passing in the sets as a list using IN generates type=index, but as an OR'd list it shows up as type=range. Range takes orders of magnitudes longer when passing in a large list indexed against a table with a few million rows. Here is the output of the real world EXPLAIN, which I did not reduce to a simple example so I will leave the setup as an exercise for the reader...

Here is using IN with the query running fast:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	subidFeed	range	subidID	subidID	8	NULL	102640	Using where; Using index

Here is using OR with the query running slow:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	subidFeed	range	subidID	subidID	8	NULL	102640	Using where; Using index

So apparently, sometimes the optimizer gets it right, although I find it annoying that in the first case it shows possible_keys as NULL even though it uses key subidID .