Bug #28871 | restriction using where in (...) is faster with big in set | ||
---|---|---|---|
Submitted: | 4 Jun 2007 11:30 | Modified: | 29 Jun 2007 20:52 |
Reporter: | Oli Sennhauser | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.0.42, 5.1.19 | OS: | Any |
Assigned to: | Igor Babaev | CPU Architecture: | Any |
Tags: | bfsm_2007_06_21, bfsm_2007_06_28 |
[4 Jun 2007 11:30]
Oli Sennhauser
[4 Jun 2007 13:48]
Mark Leith
Verified as described: mysql> SELECT SQL_NO_CACHE items.* FROM items USE INDEX (pubdate) WHERE items.feed_id IN ('5403','5267','56','5216','8146','6958','9424','6766','8836','6984','10951',' 7597','8199','10550','8716','8008','8143','6577','8016','5338','6369','6953','5351','5454','8000','5334','7579','7727','449','5367','10737','7728','8009','10895', '92','6792','181','9904','7942','7931','5373','1844','7947','225','9903','5129', '8019','8020','7936','7938','10916','10847','5116','5117','8204','5397','5104','7999','156','8911' ,'5365','7671','5470','7672','30','39','8152','8150','1458','3504','5250','6773','2813','2815','5252','8686','8110','6661','7602','10749','5253','5254','5255','26','9803', '6575','5340','6790','7420') ORDER BY items.published LIMIT 0, 20; ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.10 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) mysql> SELECT SQL_NO_CACHE items.* FROM items USE INDEX (pubdate) WHERE items.feed_id IN ('5403','5267','56','5216','8146','6958','9424') ORDER BY items.published LIMI T 0, 20; ... 20 rows in set (0.12 sec) ... 20 rows in set (0.11 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) Checking 5.0 next
[4 Jun 2007 13:59]
Mark Leith
5.0 results: mysql> SELECT SQL_NO_CACHE items.* FROM items USE INDEX (pubdate) WHERE items.feed_id IN ('5403','5267','56','5216','8146','6958','9424','6766','8836','6984','10951',' 7597','8199','10550','8716','8008','8143','6577','8016','5338','6369','6953','5351','5454','8000','5334','7579','7727','449','5367','10737','7728','8009','10895', '92','67 92','181','9904','7942','7931','5373','1844','7947','225','9903','5129', '8019','8020','7936','7938','10916','10847','5116','5117','8204','5397','5104','7999','156','8911' ,'5365','7671','5470','7672','30','39','8152','8150','1458','3504','5250','6773','2813','2815','5252','8686','8110','6661','7602','10749','5253','5254','5255','26','9803', '6575','5340','6790','7420') ORDER BY items.published LIMIT 0, 20; ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.16 sec) ... 20 rows in set (0.10 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) ... 20 rows in set (0.09 sec) mysql> SELECT SQL_NO_CACHE items.* FROM items USE INDEX (pubdate) WHERE items.feed_id IN ('5403','5267','56','5216','8146','6958','9424') ORDER BY items.published LIMI T 0, 20; ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.12 sec) ... 20 rows in set (0.13 sec) ... 20 rows in set (0.12 sec) So this affects 5.0 as well.
[4 Jun 2007 16:44]
Oli Sennhauser
We found something (thanks to HarrisonF!) Issuing the fast query (the one with many values in the IN (...)) produces around: Handler_read_prev = 60 Issuing the slow query (the one with little (5) values in the IN (...)) produces around: Handler_read_prev = 86000; For the same query but different amount of rows in the IN (...)!
[29 Jun 2007 20:52]
Timothy Smith
Hi. I'm setting this to "Not a bug"; please correct me if you feel my analysis is incorrect. MySQL is using an index scan, because you've forced it to read from the pubdate index. It's scanning the table, looking for the first 20 rows which match the IN condition. Once it finds the 20th matching row, it's done and can quit. For each row (assuming random distribution), it's twice as likely that it will match IN(<list of 100 constants>) than IN(<list of 50 constants>). As you increase the size of the IN list, you decrease the number of rows you have to look at before you get your 20 matches. It's exactly analogous to a query like: select title, pubdate from articles where is_published = 1 order by pubdate desc limit 20; Assume most articles are published, and a few are not. It'll find those 20 articles very quickly. If you change it to is_published = 0, it'll have to examine a lot more rows before it can find 20 unpublished articles. I don't see a bug here. The "using where" in EXPLAIN, along with the "index" join type, indicates that MySQL is doing the logical thing here. Regards, Timothy