Bug #4638 WHERE field IN(x) optimizes different than WHERE field = x
Submitted: 20 Jul 2004 1:17 Modified: 4 Aug 2004 16:55
Reporter: Peter Brodersen (Candidate Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.20 OS:Linux (Linux, 2.4.x)
Assigned to: Dean Ellis CPU Architecture:Any

[20 Jul 2004 1:17] Peter Brodersen
Description:
(same result in 5.0.0-alpha under WinXP)

A query using
    IN([some constant])
seem to be optimized different than
    = [some constant]
.. where only one value in IN() is used.

An explain shows that different join types are used, which also affects other tables.

How to repeat:
Only change in SELECT-queries are that "id = 1" is replaced with "id IN(1)".

1. create table bar (id int not null primary key, person_id_begin int, person_id_end int);
2. insert into bar values (1,1,2),(2,3,5),(3,7,10);
3a. explain select * from bar where id = 1;
3b. explain select * from bar where id IN(1);
(different type and const in explain, although same number of rows)

Another example using a join shows further differences:

4. create table foo (person_id int not null, task_id int not null, key(person_id));
5. insert into foo values (1,1),(1,2),(1,5),(1,9),(2,1),(2,3),(2,8),(3,2),(3,3),(5,3),(5,8),(5,10),(6,2),(6,9),(7,12),(8,1),(8,2),(9,5),(10,9),(11,1),(12,3),(15,8),(15,2),(16,3),(17,3),(20,8),(30,10),(31,2),(32,9),(37,12);
6a. explain select * from foo, bar where bar.id = 1 and foo.person_id between bar.person_id_begin AND bar.person_id_end;
6b. explain select * from foo, bar where bar.id IN(1) and foo.person_id between bar.person_id_begin AND bar.person_id_end;

6a:
+-------+-------+---------------+-----------+---------+-------+------+-------------+
| table | type  | possible_keys | key       | key_len | ref   | rows | Extra       |
+-------+-------+---------------+-----------+---------+-------+------+-------------+
| bar   | const | PRIMARY       | PRIMARY   |       4 | const |    1 |             |
| foo   | range | person_id     | person_id |       4 | NULL  |    7 | Using where |
+-------+-------+---------------+-----------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

6b:
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                        |
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| bar   | range | PRIMARY       | PRIMARY |       4 | NULL |    1 | Using where                                  |
| foo   | ALL   | person_id     | NULL    |    NULL | NULL |   30 | Range checked for each record (index map: 1) |
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
2 rows in set (0.02 sec)

Notice that more rows seem to be checked. Same result in both cases in 5.0.0-alpha.

Suggested fix:
"... IN(n)" should be regarded the same way as "... = n"
[4 Aug 2004 16:55] Dean Ellis
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html