MySQL Bugs: #4638: WHERE field IN(x) optimizes different than WHERE field = x

Bug #4638	WHERE field IN(x) optimizes different than WHERE field = x
Submitted:	20 Jul 2004 1:17	Modified:	4 Aug 2004 16:55
Reporter:	Peter Brodersen (Candidate Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	4.0.20	OS:	Linux (Linux, 2.4.x)
Assigned to:	Dean Ellis	CPU Architecture:	Any

Description:
(same result in 5.0.0-alpha under WinXP)

A query using
    IN([some constant])
seem to be optimized different than
    = [some constant]
.. where only one value in IN() is used.

An explain shows that different join types are used, which also affects other tables.

How to repeat:
Only change in SELECT-queries are that "id = 1" is replaced with "id IN(1)".

1. create table bar (id int not null primary key, person_id_begin int, person_id_end int);
2. insert into bar values (1,1,2),(2,3,5),(3,7,10);
3a. explain select * from bar where id = 1;
3b. explain select * from bar where id IN(1);
(different type and const in explain, although same number of rows)

Another example using a join shows further differences:

4. create table foo (person_id int not null, task_id int not null, key(person_id));
5. insert into foo values (1,1),(1,2),(1,5),(1,9),(2,1),(2,3),(2,8),(3,2),(3,3),(5,3),(5,8),(5,10),(6,2),(6,9),(7,12),(8,1),(8,2),(9,5),(10,9),(11,1),(12,3),(15,8),(15,2),(16,3),(17,3),(20,8),(30,10),(31,2),(32,9),(37,12);
6a. explain select * from foo, bar where bar.id = 1 and foo.person_id between bar.person_id_begin AND bar.person_id_end;
6b. explain select * from foo, bar where bar.id IN(1) and foo.person_id between bar.person_id_begin AND bar.person_id_end;

6a:
+-------+-------+---------------+-----------+---------+-------+------+-------------+
| table | type  | possible_keys | key       | key_len | ref   | rows | Extra       |
+-------+-------+---------------+-----------+---------+-------+------+-------------+
| bar   | const | PRIMARY       | PRIMARY   |       4 | const |    1 |             |
| foo   | range | person_id     | person_id |       4 | NULL  |    7 | Using where |
+-------+-------+---------------+-----------+---------+-------+------+-------------+
2 rows in set (0.00 sec)

6b:
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                        |
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
| bar   | range | PRIMARY       | PRIMARY |       4 | NULL |    1 | Using where                                  |
| foo   | ALL   | person_id     | NULL    |    NULL | NULL |   30 | Range checked for each record (index map: 1) |
+-------+-------+---------------+---------+---------+------+------+----------------------------------------------+
2 rows in set (0.02 sec)

Notice that more rows seem to be checked. Same result in both cases in 5.0.0-alpha.

Suggested fix:
"... IN(n)" should be regarded the same way as "... = n"

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html