Bug #65957 Poor correlated subquery performance with order by and limit
Submitted: 20 Jul 2012 7:00 Modified: 21 Jun 2013 19:41
Reporter: Honza Horak (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.62, 5.5.27, 5.6.6 OS:Linux
Assigned to: CPU Architecture:Any
Tags: regression

[20 Jul 2012 7:00] Honza Horak
Description:
This issue was initially reported at https://bugzilla.redhat.com/show_bug.cgi?id=830326. Reporter originally believed he was experiencing the issue as described in bug #64557 (http://bugs.mysql.com/bug.php?id=64557). That bug states to be fixed in 5.6.5.

The testing data dump is available at Red Hat Bugzilla bug report mentioned above and a query against the dump that demonstrates the problem is the following:

SELECT foo.foo_id, foo.data, bar.bar_id, bar.timestamp FROM foo JOIN bar on foo.foo_id = bar.foo_id WHERE ( bar.foo_id = ( SELECT bar.foo_id FROM bar WHERE bar.foo_id = foo.foo_id ORDER BY bar.timestamp DESC LIMIT 1 ) ) LIMIT 1000;

We tested it in 5.0.95, 5.5.24 and 5.6.5 and got the following query execution times:

----------------------------
| mysql-5.0.95   | ~0.02 s |
| mysql-5.5.24   | ~4.00 s |
| mysql-5.6.6-m9 | ~2.00 s |
----------------------------

So the performance issue doesn't seem to be actually fixed even in mysql-5.6.6 source code, which means reporter probably hit another issue than bug #64557.

Going a bit deeper, EXPLAIN shows the same result in mysql-5.5.24 and in mysql-5.6.6:

+----+--------------------+-------+-------+---------------+
| id | select_type        | table | type  | possible_keys |
+----+--------------------+-------+-------+---------------+
|  1 | PRIMARY            | foo   | ALL   | PRIMARY       |
|  1 | PRIMARY            | bar   | ref   | foo_id        |
|  2 | DEPENDENT SUBQUERY | bar   | index | foo_id        |
+----+--------------------+-------+-------+---------------+
-----------+---------+------+------+-------------+
 key       | key_len | ref  | rows | Extra       |
-----------+---------+------+------+-------------+
 NULL      | NULL    | NULL |  885 | Using where |
 foo_id    | 4       | func |   49 | Using where |
 timestamp | 4       | NULL |    1 | Using where |
-----------+---------+------+------+-------------+

...meanwhile mysql-5.0.95 shows a bit different EXPLAIN:

+----+--------------------+-------+------+---------------+--------+
| id | select_type        | table | type | possible_keys | key    |
+----+--------------------+-------+------+---------------+--------+
|  1 | PRIMARY            | foo   | ALL  | PRIMARY       | NULL   |
|  1 | PRIMARY            | bar   | ref  | foo_id        | foo_id |
|  2 | DEPENDENT SUBQUERY | bar   | ref  | foo_id        | foo_id |
+----+--------------------+-------+------+---------------+--------+
---------+----------------------+------+-----------------------------+
 key_len | ref                  | rows | Extra                       |
---------+----------------------+------+-----------------------------+
 NULL    | NULL                 | 1099 | Using where                 | 
 4       | func                 |   50 | Using where                 | 
 4       | bug830326.foo.foo_id |   50 | Using where; Using filesort | 
---------+----------------------+------+-----------------------------+

That means that the use of timestamp index significantly slows down the query, which is what one wouldn't expect.

Using "IGNORE INDEX (timestamp)" for bar table in the subquery we got the same fast results as in mysql-5.0.95, so it could be a workaround for this issue.

How to repeat:
Use the testing data dump available at https://bugzilla.redhat.com/show_bug.cgi?id=830326 and execute query:

SELECT foo.foo_id, foo.data, bar.bar_id, bar.timestamp FROM foo JOIN bar on foo.foo_id = bar.foo_id WHERE ( bar.foo_id = ( SELECT bar.foo_id FROM bar WHERE bar.foo_id = foo.foo_id ORDER BY bar.timestamp DESC LIMIT 1 ) ) LIMIT 1000;

Suggested fix:
Optimizer should probably use an execution plan used in mysql-5.0.95.
[20 Jul 2012 8:54] Valeriy Kravchuk
Test data to load

Attachment: test.sql.gz (application/gzip, text), 200.10 KiB.

[20 Jul 2012 9:00] Valeriy Kravchuk
Related to that other bug or not, this is a real problem. This is what we see for the plan used by default:

mysql> show session status like 'Handler%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 1       |
| Handler_read_key           | 31      |
| Handler_read_last          | 20      |
| Handler_read_next          | 999     |
| Handler_read_prev          | 1989000 |
| Handler_read_rnd           | 0       |
| Handler_read_rnd_next      | 10      |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
16 rows in set (0.00 sec)

While if we add IGNORE INDEX(timestamp) we see that number of actions is notably smaller:

mysql> show session status like 'Handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 1     |
| Handler_read_key           | 31    |
| Handler_read_last          | 0     |
| Handler_read_next          | 2999  |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 10    |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
16 rows in set (0.00 sec)

Probably optimizer does NOT take DESC into account properly.
[23 Jul 2012 10:37] Valeriy Kravchuk
5.1 is also affected:

C:\Program Files\MySQL\MySQL Server 5.5\bin>mysql -uroot -proot -P3310 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.1.62-community MySQL Community Server (GPL)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> explain SELECT foo.foo_id, foo.data, bar.bar_id, bar.timestamp FROM foo J
OIN bar on foo.foo_id = bar.foo_id WHERE ( bar.foo_id = ( SELECT bar.foo_id FROM
 bar WHERE bar.foo_id = foo.foo_id ORDER BY bar.timestamp DESC LIMIT 1 ) ) LIMIT
 1000;
+----+--------------------+-------+-------+---------------+-----------+---------
+------+------+-------------+
| id | select_type        | table | type  | possible_keys | key       | key_len
| ref  | rows | Extra       |
+----+--------------------+-------+-------+---------------+-----------+---------
+------+------+-------------+
|  1 | PRIMARY            | foo   | ALL   | PRIMARY       | NULL      | NULL
| NULL | 1068 | Using where |
|  1 | PRIMARY            | bar   | ref   | foo_id        | foo_id    | 4
| func |   49 | Using where |
|  2 | DEPENDENT SUBQUERY | bar   | index | foo_id        | timestamp | 4
| NULL |    1 | Using where |
+----+--------------------+-------+-------+---------------+-----------+---------
+------+------+-------------+
3 rows in set (0.08 sec)
[21 Jun 2013 19:40] Olav Sandstå
This bug seems to be fixed in 5.6.8 by the fix for Bug#45969 / Bug#14338686.

The change log entries for these bugs in the release notes for 5.6.8 says: "For some queries involving ORDER BY, the optimizer chose the wrong
index for accessing the table."

I am closing this bug as a duplicate.