Bug #25926 dependent suqbuery takes too long
Submitted: 29 Jan 2007 19:34 Modified: 23 Jan 2008 18:19
Reporter: Carl Longnecker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.0.18, 5.0.24, 5.0.27, 5.0.37 OS:Windows (windows xp)
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: subquery benchmark

[29 Jan 2007 19:34] Carl Longnecker
Description:
two queries that produce the same result (find people with the same phone number), but the first one takes way too long.

SELECT  Name
FROM    People
WHERE   Phone IN (SELECT Phone 
                  FROM   People 
                  GROUP BY Phone 
                  HAVING COUNT(*) > 1)

SELECT  Name
FROM    People
WHERE   Phone IN (SELECT Phone FROM (SELECT Phone 
                  FROM   People 
                  GROUP BY Phone 
                  HAVING COUNT(*) > 1) InnerTable)

and the explains:

+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY            | People | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where |
|  2 | DEPENDENT SUBQUERY | People | index | NULL          | phone | 35      | NULL | 2485 | Using index |
+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type        | table      | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY            | People     | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where |
|  2 | DEPENDENT SUBQUERY | <derived3> | ALL   | NULL          | NULL  | NULL    | NULL |  203 | Using where |
|  3 | DERIVED            | People     | index | NULL          | phone | 35      | NULL | 2485 | Using index |
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+

based on the numbers, it appears 

How to repeat:
see post #9 here (http://www.sitepoint.com/forums/showthread.php?t=454706) for the sample data i used. note that i hashed the phone number and changed the name to obscure my data, but ti doesn't affect the result.

Suggested fix:
based on the timing of the query, it looks like in the first version, the subquery is being repeated for each row in the main part of the query. it only needs to be executed once to generate a derived query.
[30 Jan 2007 11:09] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newew version, 5.0.27. In case of the same problem, please, upload that your people.zip test case to this bug report.
[30 Jan 2007 21:21] John Pilotta
This bug is present in the 5.0.27 version. I have used similar subqueries and where it is obvious that the subquery should be run only once (and not per row) the optimizer seems to choose a plan that runs the subquery multiple times resulting in a huge performance penalty.

Enclosing the subquery in another trivial SELECT col FROM ( <subquery> ) as described in the bug fixes the problem.
[30 Jan 2007 21:48] Carl Longnecker
data for test case

Attachment: people.zip (application/x-zip-compressed, text), 51.90 KiB.

[30 Jan 2007 21:50] Carl Longnecker
i have confirmed that my test case exhibits the same behavior in 5.0.27
[12 Mar 2007 10:01] Valeriy Kravchuk
Verified just as described with 5.0.37 also. It would be nice for optimizer to rewrite query or just understand that subquery is not dependent at all. I hope it will be fixed in 5.x.y, where x>0. Until that there is a workaround this your report demonstrates.
[16 Mar 2007 5:09] Stephen Gornick
I just filed Bug #27199, which may be related.
http://bugs.mysql.com/bug.php?id=27199
[16 Mar 2007 5:37] Igor Babaev
The problem will be fixed in 5.2 (when WL #2980 is done).
[23 Jan 2008 1:03] Sergey Petrunya
This subquery is covered by new subquery materialization optimization (WL#1110) in 6.0.

EXPLAIN in 6.0.5: 

EXPLAIN SELECT Name FROM People WHERE Phone IN (SELECT Phone FROM People GROUP BY Phone HAVING COUNT(*) > 1);
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY     | People | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where | 
|  2 | SUBQUERY    | People | index | NULL          | phone | 35      | NULL | 2485 | Using index | 
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+

Query execution times:
MySQL 6.0.3:  0.03 sec
MySQL 6.0.3, with subquery materialization disabled: 28.53 sec. 
It's 950 times faster now :-)

Setting bug status to closed. Carl, thanks for taking time to file. Feel free to re-open the bug if you think resolution is not complete.
[16 Nov 2012 19:22] Clint Priest
I know this has been fixed for some time for the 6.0 release, is there any way this can be backported to 5.5 and 5.6?