Bug #49276 Optimization for IN (on InnoDB tables) depends on number of set elements
Submitted: 1 Dec 2009 17:03 Modified: 1 Dec 2009 17:25
Reporter: Kukushkina Ganna Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.85-community-log, 5.0.89, 5.1.42 OS:Linux
Assigned to: CPU Architecture:Any

[1 Dec 2009 17:03] Kukushkina Ganna
Description:
It seems that when IN is used on InnoDb tables (works fine on MyISAM tables) to select a set of indexed values, index is used only when number of elements in IN() does not exceed certain number (differs depending on number of rows in the table):

CREATE TABLE testInOptimisation(col1 INTEGER PRIMARY KEY)ENGINE = InnoDB;

INSERT INTO testInOptimisation VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testInOptimisation
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testInOptimisation
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 10
        Extra: Using where; Using index
1 row in set (0.00 sec)

INSERT INTO testInOptimisation VALUES (11),(12),(13),(14);
mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: testInOptimisation
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
        Extra: Using where; Using index

Not sure if this can be configured or it relates to documented restrictions, tried to look for both, but didn't find a solution/answer

How to repeat:
CREATE TABLE testInOptimisation(col1 INTEGER PRIMARY KEY)ENGINE = InnoDB;
INSERT INTO testInOptimisation VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2);
EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3);
INSERT INTO testInOptimisation VALUES (11),(12),(13),(14);
mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3);
[1 Dec 2009 17:25] Valeriy Kravchuk
I can confirm this on 5.0.89 and 5.1.42 also, but for me same happens with MyISAM table as well:

mysql> alter table `testInOptimisation` engine=MyISAM;
Query OK, 14 rows affected (0.00 sec)
Records: 14  Duplicates: 0  Warnings: 0

mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2);
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table              | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | testInOptimisation | index | PRIMARY       | PRIMARY | 4       | NULL |   14 | Using where; Using index | 
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3,4);
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table              | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | testInOptimisation | range | PRIMARY       | PRIMARY | 4       | NULL |    4 | Using where; Using index | 
+----+-------------+--------------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.01 sec)

mysql> select version();
+--------------+
| version()    |
+--------------+
| 5.0.89-debug | 
+--------------+
1 row in set (0.00 sec)

So, we have either "index" (full scan of index) or "range" (scan of index range), depending on number of elements in the table and elements in the IN list.

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 2     | 
| Handler_read_key      | 11    | 
| Handler_read_next     | 14    | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 0     | 
| Handler_read_rnd_next | 66    | 
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> SELECT * FROM testInOptimisation WHERE col1 IN(1,2,3,4);
+------+
| col1 |
+------+
|    1 | 
|    2 | 
|    3 | 
|    4 | 
+------+
4 rows in set (0.01 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 2     | 
| Handler_read_key      | 15    | 
| Handler_read_next     | 14    | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 0     | 
| Handler_read_rnd_next | 73    | 
+-----------------------+-------+
6 rows in set (0.00 sec)

mysql> SELECT * FROM testInOptimisation WHERE col1 IN(1,2);
+------+
| col1 |
+------+
|    1 | 
|    2 | 
+------+
2 rows in set (0.00 sec)

mysql> show session status like 'Handler_read%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 3     | 
| Handler_read_key      | 15    | 
| Handler_read_next     | 28    | 
| Handler_read_prev     | 0     | 
| Handler_read_rnd      | 0     | 
| Handler_read_rnd_next | 80    | 
+-----------------------+-------+
6 rows in set (0.00 sec)

So, server performed 4 key reads to get 4 rows of result (with IN(1,2,3,4)), but 1+14 key reads to get only 2 rows of result (with IN(1,2)). I'd say this is a bug in optimizer.