Bug #44962 DEPENDENT SUBQUERY not using a range index, resulting in slow performance
Submitted: 19 May 2009 19:34 Modified: 22 Oct 2012 8:10
Reporter: Bassam Tabbara Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.34, 5.1.36-bzr, 5.1.40-bzr OS:Linux
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: subquery

[19 May 2009 19:34] Bassam Tabbara
Description:
A SELECT statement that has a nested select with a correlated range filter seems to not be doing a range search on the index, resulting in very slow performance.

How to repeat:
Import the attached dump file.

Run the following query:

SELECT SQL_NO_CACHE
  c.id,
  (SELECT VirtualNodeId FROM VirtualNode WHERE VirtualNodeId >= c.id LIMIT 1)
FROM
  C c;

The query takes 2+ secs on my machine. 

Observations:
* EXPLAIN EXTENDED suggest that the primary index on VirtualNode is being used, however it does not look like it its doing a range search
* Replacing the "VirtualNodeId >= c.id" with "VirtualNode >= 0x00" dramatically improves the performance. EXPLAIN extended looks like its now using a "range".

Suggested fix:
Enable support for range searches on DEPENDENT subqueries.
[19 May 2009 19:39] Bassam Tabbara
Dump uploaded to ftp://ftp.mysql.com/pub/mysql/upload/bug-data-44962.out
[15 Jun 2009 11:44] Valeriy Kravchuk
I can confirm your findings with recent 5.1.36 from bzr. But can you, please, try the following modified query (that gives the same plan, but different rows estimation):

SELECT SQL_NO_CACHE   c.id,   (SELECT VirtualNodeId FROM VirtualNode WHERE VirtualNodeId >= c.id order by VirtualNodeId desc LIMIT 1) FROM   C c;

For me it runs instantly (original one requires 8_ seconds on my laptop). Please, check if it returns the same result.
[15 Jun 2009 18:18] Bassam Tabbara
you are correct that using the ORDER BY VirtualNodeId DESC helps the query. This is because the rows in table C start with 0xFF, and hence there is less scanning when VirtualNodeId are sorted in descending order.

Unfortunately this changes the query we would like to run, and does not resolve the root cause -- mysql does not effectively use the index on virtual node.
[15 Sep 2009 18:45] Valeriy Kravchuk
So, workaround is not acceptable in general. We have the following plans:

mysql> explain SELECT SQL_NO_CACHE   c.id,   (SELECT VirtualNodeId FROM VirtualNode WHERE VirtualNodeId >= c.id LIMIT 1) FROM   C c;
+----+--------------------+-------------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type        | table       | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+----+--------------------+-------------+-------+---------------+---------+---------+------+-------+--------------------------+
|  1 | PRIMARY            | c           | index | NULL          | PRIMARY | 20      | NULL |   100 | Using index              |
|  2 | DEPENDENT SUBQUERY | VirtualNode | index | PRIMARY       | PRIMARY | 20      | NULL | 51513 | Using where; Using index |
+----+--------------------+-------------+-------+---------------+---------+---------+------+-------+--------------------------+
2 rows in set (0.00 sec)

This query runs slow. Handler_read_next is incremented a lot. Indeed, one would expect range scan for a dependent subquery, for a range starting with c.id from outer query and to maximum value. LIMIT 1 should then stop this scan at the very first index with VirtiualNodeId >= c.id.

mysql> explain SELECT SQL_NO_CACHE   c.id,   (SELECT VirtualNodeId FROM VirtualNode WHERE VirtualNodeId
    -> >= c.id order by VirtualNodeId desc LIMIT 1) FROM   C c;
+----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type        | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | PRIMARY            | c           | index | NULL          | PRIMARY | 20      | NULL |  100 | Using index              |
|  2 | DEPENDENT SUBQUERY | VirtualNode | index | PRIMARY       | PRIMARY | 20      | NULL |    1 | Using where; Using index |
+----+--------------------+-------------+-------+---------------+---------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)

This one runs fast. Handler_read_next is NOT incremented.
[22 Oct 2012 8:10] Jørgen Løland
The dump required to reproduce the issue has unfortunately been lost, but this problem is most likely a duplicate of BUG#41659 which has been fixed in 5.6.8.

If 5.6.8 does not fix the issue, original poster may reopen the bug and supply a new repro.