MySQL Bugs: #25926: dependent suqbuery takes too long

Bug #25926	dependent suqbuery takes too long
Submitted:	29 Jan 2007 19:34	Modified:	23 Jan 2008 18:19
Reporter:	Carl Longnecker	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	5.0.18, 5.0.24, 5.0.27, 5.0.37	OS:	Windows (windows xp)
Assigned to:	Sergey Petrunya	CPU Architecture:	Any
Tags:	subquery benchmark

Description:
two queries that produce the same result (find people with the same phone number), but the first one takes way too long.

SELECT  Name
FROM    People
WHERE   Phone IN (SELECT Phone 
                  FROM   People 
                  GROUP BY Phone 
                  HAVING COUNT(*) > 1)

SELECT  Name
FROM    People
WHERE   Phone IN (SELECT Phone FROM (SELECT Phone 
                  FROM   People 
                  GROUP BY Phone 
                  HAVING COUNT(*) > 1) InnerTable)

and the explains:

+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type        | table  | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY            | People | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where |
|  2 | DEPENDENT SUBQUERY | People | index | NULL          | phone | 35      | NULL | 2485 | Using index |
+----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type        | table      | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY            | People     | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where |
|  2 | DEPENDENT SUBQUERY | <derived3> | ALL   | NULL          | NULL  | NULL    | NULL |  203 | Using where |
|  3 | DERIVED            | People     | index | NULL          | phone | 35      | NULL | 2485 | Using index |
+----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+

based on the numbers, it appears 

How to repeat:
see post #9 here (http://www.sitepoint.com/forums/showthread.php?t=454706) for the sample data i used. note that i hashed the phone number and changed the name to obscure my data, but ti doesn't affect the result.

Suggested fix:
based on the timing of the query, it looks like in the first version, the subquery is being repeated for each row in the main part of the query. it only needs to be executed once to generate a derived query.

Thank you for a problem report. Please, try to repeat with a newew version, 5.0.27. In case of the same problem, please, upload that your people.zip test case to this bug report.

This bug is present in the 5.0.27 version. I have used similar subqueries and where it is obvious that the subquery should be run only once (and not per row) the optimizer seems to choose a plan that runs the subquery multiple times resulting in a huge performance penalty.

Enclosing the subquery in another trivial SELECT col FROM ( <subquery> ) as described in the bug fixes the problem.

data for test case

Attachment: people.zip (application/x-zip-compressed, text), 51.90 KiB.

i have confirmed that my test case exhibits the same behavior in 5.0.27

Verified just as described with 5.0.37 also. It would be nice for optimizer to rewrite query or just understand that subquery is not dependent at all. I hope it will be fixed in 5.x.y, where x>0. Until that there is a workaround this your report demonstrates.

I just filed Bug #27199, which may be related.
http://bugs.mysql.com/bug.php?id=27199

The problem will be fixed in 5.2 (when WL #2980 is done).

This subquery is covered by new subquery materialization optimization (WL#1110) in 6.0.

EXPLAIN in 6.0.5: 

EXPLAIN SELECT Name FROM People WHERE Phone IN (SELECT Phone FROM People GROUP BY Phone HAVING COUNT(*) > 1);
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+
|  1 | PRIMARY     | People | ALL   | NULL          | NULL  | NULL    | NULL | 2485 | Using where | 
|  2 | SUBQUERY    | People | index | NULL          | phone | 35      | NULL | 2485 | Using index | 
+----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+

Query execution times:
MySQL 6.0.3:  0.03 sec
MySQL 6.0.3, with subquery materialization disabled: 28.53 sec. 
It's 950 times faster now :-)

Setting bug status to closed. Carl, thanks for taking time to file. Feel free to re-open the bug if you think resolution is not complete.

I know this has been fixed for some time for the 6.0 release, is there any way this can be backported to 5.5 and 5.6?