Bug #1724 WHERE ... IN() optimizer behaviour has changed since 4.0.14
Submitted: 31 Oct 2003 5:23 Modified: 3 Nov 2003 13:51
Reporter: Sergey Kostyliov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.16 OS:Linux (Linux Gentoo 1.4)
Assigned to: Konstantin Osipov CPU Architecture:Any

[31 Oct 2003 5:23] Sergey Kostyliov
Description:
Default optimizer behavior has changed in 4.0.16 (since 4.0.14) 
for this simple question: 
 
select state_id, orderdata_id from order_delivery where 
orderdata_id in 
(3193340,3193343,3193346,3193349,3193352,3193355) 
and is_deleted=0 order by xtime desc; 
 
Execution time has grown up from 0.04s to 47.70. 
 

How to repeat:
order_delivery.sql.gz is attached. 
 
CREATE TABLE `order_delivery` ( 
  `orderdata_id` int(11) NOT NULL default '0', 
  `state_id` int(11) NOT NULL default '0', 
  `xtime` datetime NOT NULL default '0000-00-00 00:00:00', 
  `admin_user_id` int(11) NOT NULL default '0', 
  `note` text NOT NULL, 
  `is_deleted` int(1) NOT NULL default '0', 
  KEY `orderdata_id` (`orderdata_id`), 
  KEY `is_deleted` (`is_deleted`), 
  KEY `xtime` (`xtime`) 
) TYPE=InnoDB 
 
4.0.14 
======= 
mysql> EXPLAIN select state_id, orderdata_id from order_delivery where 
    -> orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
+----------------+-------+-------------------------+--------------+---------+------+------+-----------------------------+ 
| table          | type  | possible_keys           | key          | key_len | 
ref  | rows | Extra                       | 
+----------------+-------+-------------------------+--------------+---------+------+------+-----------------------------+ 
| order_delivery | range | orderdata_id,is_deleted | orderdata_id |       4 | 
NULL |    5 | Using where; Using filesort | 
+----------------+-------+-------------------------+--------------+---------+------+------+-----------------------------+ 
1 row in set (0.00 sec) 
 
mysql> select state_id, orderdata_id from order_delivery where 
    -> orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
Empty set (0.04 sec) 
 
4.0.16 
======= 
mysql> EXPLAIN select state_id, orderdata_id from order_delivery where 
    -> orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+ 
| table          | type | possible_keys           | key        | key_len | ref   
| rows   | Extra                       | 
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+ 
| order_delivery | ref  | orderdata_id,is_deleted | is_deleted |       4 | 
const | 474894 | Using where; Using filesort | 
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+ 
1 row in set (0.01 sec) 
 
mysql> select state_id, orderdata_id from order_delivery where 
    -> orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
Empty set (47.40 sec) 

Suggested fix:
USE INDEX(orderdata_id) 
 
select state_id, orderdata_id from order_delivery use index(orderdata_id) 
where orderdata_id in 
(3193340,3193343,3193346,3193349,3193352,3193355) 
and is_deleted=0 order by xtime desc; 
 
4.0.16 
======= 
mysql> EXPLAIN select state_id, orderdata_id from order_delivery use 
index(orderdata_id) 
    -> where orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
+----------------+-------+---------------+--------------+---------+------+------+-----------------------------+ 
| table          | type  | possible_keys | key          | key_len | ref  | 
rows | Extra                       | 
+----------------+-------+---------------+--------------+---------+------+------+-----------------------------+ 
| order_delivery | range | orderdata_id  | orderdata_id |       4 | NULL |    
5 | Using where; Using filesort | 
+----------------+-------+---------------+--------------+---------+------+------+-----------------------------+ 
1 row in set (0.04 sec) 
 
mysql> select state_id, orderdata_id from order_delivery use 
index(orderdata_id) 
    -> where orderdata_id in 
    -> (3193340,3193343,3193346,3193349,3193352,3193355) 
    -> and is_deleted=0 order by xtime desc; 
Empty set (0.03 sec)
[31 Oct 2003 5:34] Sergey Kostyliov
Sad, but the size of gziped sql dump is about 8Mb so it can not be attached 
here due to 200Kb limit. 
But you can download it via http: 
http://sysadminday.org.ru/order_delivery.sql.gz
[31 Oct 2003 6:04] Sergey Kostyliov
I had also seen huge perfomance degradation in some a bit more complicated 
queries with MyISAM tables (without force/use hints) in 4.0.16, which I have 
never seen in 4.0.14. So I don't think it's related to InnoDB.
[31 Oct 2003 9:59] Peter Zaitsev
Thank your for Excellent bug report. I was able to repeat it both with MyISAM and Innodb tables. It is quite unclear to me what is happening as the estimated amount of rows for "range" is very small. 

mysql> explain select state_id, orderdata_id from order_delivery where orderdata_id in (3193340,3193343,3193346,3193349,3193352,3193355) and is_deleted=0 order by xtime desc;
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+
| table          | type | possible_keys           | key        | key_len | ref   | rows   | Extra                       |
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+
| order_delivery | ref  | orderdata_id,is_deleted | is_deleted |       4 | const | 469254 | Using where; Using filesort |
+----------------+------+-------------------------+------------+---------+-------+--------+-----------------------------+
1 row in set (0.17 sec)

mysql> explain select count(*) from order_delivery where is_deleted=0;
+----------------+------+---------------+------------+---------+-------+--------+--------------------------+
| table          | type | possible_keys | key        | key_len | ref   | rows   | Extra                    |
+----------------+------+---------------+------------+---------+-------+--------+--------------------------+
| order_delivery | ref  | is_deleted    | is_deleted |       4 | const | 469254 | Using where; Using index |
+----------------+------+---------------+------------+---------+-------+--------+--------------------------+
1 row in set (0.01 sec)

mysql> explain select count(*) from order_delivery where orderdata_id in (3193340,3193343,3193346,3193349,3193352,3193355);
+----------------+-------+---------------+--------------+---------+------+------+--------------------------+
| table          | type  | possible_keys | key          | key_len | ref  | rows | Extra                    |
+----------------+-------+---------------+--------------+---------+------+------+--------------------------+
| order_delivery | range | orderdata_id  | orderdata_id |       4 | NULL |    6 | Using where; Using index |
+----------------+-------+---------------+--------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

There is more bugs one can expose with the same table:

alter table order_delivery add key (is_deleted,orderdata_id);

mysql> explain select state_id, orderdata_id from order_delivery where orderdata_id in (3193340,3193343,3193346,3193349,3193352,3193355) and is_deleted=0 order by xtime desc;
+----------------+------+--------------------------------------+--------------+---------+-------+--------+-----------------------------+
| table          | type | possible_keys                        | key          | key_len | ref   | rows   | Extra                       |
+----------------+------+--------------------------------------+--------------+---------+-------+--------+-----------------------------+
| order_delivery | ref  | orderdata_id,is_deleted,is_deleted_2 | is_deleted_2 |       4 | const | 498206 | Using where; Using filesort |
+----------------+------+--------------------------------------+--------------+---------+-------+--------+-----------------------------+
1 row in set (0.03 sec)

As you see optimizer selects the proper 2 column key in this case but it does not uses the second keypart for some reason. 

You also mentioned other queries you have, which have been slowed down since MySQL 4.0.x -  Please submit them the same (or similar) way as well. We're quite eager to fix these regression issues.

If you do not want to have your data be available to the public you can upload it to ftp://support.mysql.com/pub/mysql/secret where only we will be able to access it.
[31 Oct 2003 10:17] Peter Zaitsev
I tried to work out smaller example and here it goes:

CREATE TABLE range (
  a int(11) default NULL,
  b int(11) default NULL,
  KEY a (a),
  KEY b (b)
) TYPE=MyISAM;

--
-- Dumping data for table 'range'
--

INSERT INTO range VALUES (1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,2),(10,2),(13,2),(14,2),(15,2),(16,2),(17,3),(17,3),(16,3),(17,3),(19,3),(20,3),(21,4),(22,5),(23,5),(24,5),(25,5),(26,5),(30,5),(31,5),(32,5),(33,5),(33,5),(33,5),(33,5),(33,5),(34,5),(35,5);

mysql> explain select * from range where a in(1,2) and b=5;
+-------+------+---------------+------+---------+-------+------+-------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+-------+------+---------------+------+---------+-------+------+-------------+
| range | ref  | a,b           | b    |       5 | const |   15 | Using where |
+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select * from range where a in(1,2) ;
+-------+-------+---------------+------+---------+------+------+-------------+
| table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+-------+-------+---------------+------+---------+------+------+-------------+
| range | range | a             | a    |       5 | NULL |    2 | Using where |
+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

Combined key is also not used in this case:

mysql> alter table range add key(a,b);
Query OK, 37 rows affected (0.01 sec)
Records: 37  Duplicates: 0  Warnings: 0

mysql> explain select * from range where a in(1,2) and b=5;
+-------+------+---------------+------+---------+-------+------+-------------+
| table | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+-------+------+---------------+------+---------+-------+------+-------------+
| range | ref  | a,b,a_2       | b    |       5 | const |   15 | Using where |
+-------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)
[31 Oct 2003 15:55] Konstantin Osipov
bug is caused by changes in FULL SCAN costs, made in 4.0.16
The bug has nothing to do with IN clause.
[3 Nov 2003 13:51] Konstantin Osipov
Fixed in 4.0.17, bk commit - 4.0 tree (konstantin:1.1611)
Please verify whether other, more complicated queries you mentioned 
in your report also run now as fast as in 4.0.14.
Thank you for your feedback!
[4 Nov 2003 2:33] Sergey Kostyliov
I am sorry for the delay. Unfortunately I've lost almost all 
queries I mentioned, due to logs rotation :(. The two I am 
able to remember is fixed by: "bk commit - 4.0 tree (konstantin:1.1611)". 
I've just installed 4.0.16 + your patch and it seems all problems 
with slow queries has gone. I will take a harder look in the next 
couple of days.  
Thanks to both of you!