Bug #54599 | bug in query planner on queries with "ORDER BY" and "LIMIT BY" clause | ||
---|---|---|---|
Submitted: | 17 Jun 2010 19:42 | Modified: | 24 Jul 2012 14:54 |
Reporter: | Zardosht Kasheff (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 5.1.46, 5.1, 5.6.99 bzr | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | limit by, order by |
[17 Jun 2010 19:42]
Zardosht Kasheff
[17 Jun 2010 20:52]
Sveta Smirnova
Thank you for the report. Please provide real queries and create table info: with test data in my environment key(a,b,c) is always used.
[18 Jun 2010 5:47]
Sveta Smirnova
Thank you for the feedback. I understand optimizer does optimization wrongly for customer data. But we can not verify and fix the bug until we are able to repeat the problem. But is not repeatable if I use following code: create table t1 (a int, b int, c int, d int, key(a,b,c), key(c,b,d,a)); --disable_query_log let $i = 1000; while ($i) { --eval insert into t1 values($i,$i,$i,$i) dec $i; } --enable_query_log --vertical_results explain select c, count(*) num_cnt from t1 where a > 1000 and b is not NULL group by c order by num_cnt; explain select c, count(*) num_cnt from t1 where a > 1000 and b is not NULL group by c order by num_cnt limit 10; Please change real column names to a,b,c, etc., table name to t1 and send us table structure and real query with such replacement which hides private data, but does not hide structure.
[18 Jun 2010 14:43]
Zardosht Kasheff
How to reproduce: CREATE TABLE `t` ( `a` int(11) DEFAULT NULL, `b` int(11) DEFAULT NULL, `c` int(11) DEFAULT NULL, KEY `a` (`a`,`b`,`c`), KEY `c` (`c`,`b`,`a`) ) insert elements (1,1,1),(1,1,2),(1,1,3)...(1,1,300),(1,2,1),(1,2,2)...(300,300,300) So there should be 300*300*300 elements. Then do the following queries: mysql> explain select c,count(*) numimp from t where a > 250 group by c order by numimp limit 10; +----+-------------+-------+-------+---------------+------+---------+------+---- --+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---- --+-----------------------------------------------------------+ | 1 | SIMPLE | t | index | a | c | 15 | NULL | 7 2 | Using where; Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+------+---------+------+---- --+-----------------------------------------------------------+ 1 row in set (0.00 sec) mysql> explain select c,count(*) numimp from t where a > 250 group by c order by numimp; +----+-------------+-------+-------+---------------+------+---------+------+---- -----+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | row s | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---- -----+-----------------------------------------------------------+ | 1 | SIMPLE | t | range | a | a | 5 | NULL | 336 4000 | Using where; Using index; Using temporary; Using filesort | +----+-------------+-------+-------+---------------+------+---------+------+---- -----+-----------------------------------------------------------+ 1 row in set (0.00 sec) Both queries should use 'a'. Only the one without the "LIMIT BY" does.
[18 Jun 2010 19:17]
Sveta Smirnova
Thank you for the feedback. In the initial description you wrote: "When I run the query, MySQL properly selects the key (a,b,c). However, when I add a "limit 10" to the end, the key (c,b,d,a) is used. Using (c,b,d,a) is not good because the entire index will need to be read to find the 10 values of 'c' that show up the most, whereas if (a,b,c) is used, only a range is done (where a > 1000)." While output from last comment shows only 72 rows examined if limit used and 3364000 if no limit used. Additionally I see no performance regression with test data again: select now(); now() 2010-06-18 22:10:28 select c, count(*) num_cnt from t1 where a > 250 group by c order by num_cnt; <results> select now(); now() 2010-06-18 22:10:40 select c, count(*) num_cnt from t1 where a > 250 group by c order by num_cnt limit 10; <results> select now(); now() 2010-06-18 22:10:46 Why do you consider this behavior buggy?
[18 Jun 2010 19:41]
Zardosht Kasheff
Don't trust the rows value in the explain statement. It is wrong. If it really did only examine 72 rows in the "limit by" case, it would not have finished in 6 seconds. It is doing a full index scan. The problem with the query plan is that using the key 'c' requires a full index scan, whereas using 'a' requires a partial index scan. Using 'c' requires a full index scan because of the "order by" clause. If the "order by" clause did not exist, then the query plan would be fine. Instead of having "where a > 250", try "where a > 285". The problem still exists.
[18 Jun 2010 22:56]
Sveta Smirnova
Thank you for the feedback. Verified as described: after limiting area for search query with LIMIT works slower than query without LIMIT. Test case for MTR: create table t1 (a int, b int, c int, key(a,b,c), key(c,b,a)); --disable_query_log let $i = 1000; while ($i) { let $j = 1000; while ($j) { --eval insert into t1 values($i,$j,$i) dec $j; } dec $i; } --enable_query_log --vertical_results explain select c, count(*) num_cnt from t1 where a > 885 group by c order by num_cnt; explain select c, count(*) num_cnt from t1 where a > 885 group by c order by num_cnt limit 10; select now(); --disable_result_log select c, count(*) num_cnt from t1 where a > 885 group by c order by num_cnt; --enable_result_log select now(); --disable_result_log select c, count(*) num_cnt from t1 where a > 885 group by c order by num_cnt limit 10; --enable_result_log select now();
[22 Jun 2010 17:48]
Omer Barnir
triage: setting to CHECKED (w3 use ignore index)
[24 Jul 2012 14:54]
Paul DuBois
Noted in 5.1.66, 5.5.28, 5.6.7, 5.7.0 changelogs. Adding a LIMIT clause to a query containing GROUP BY and ORDER BY could cause the optimizer to choose an incorrect index for processing the query, and return more rows than required.