| Bug #25926 | dependent suqbuery takes too long | ||
|---|---|---|---|
| Submitted: | 29 Jan 2007 19:34 | Modified: | 23 Jan 2008 18:19 |
| Reporter: | Carl Longnecker | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
| Version: | 5.0.18, 5.0.24, 5.0.27, 5.0.37 | OS: | Windows (windows xp) |
| Assigned to: | Sergey Petrunya | CPU Architecture: | Any |
| Tags: | subquery benchmark | ||
[30 Jan 2007 11:09]
Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newew version, 5.0.27. In case of the same problem, please, upload that your people.zip test case to this bug report.
[30 Jan 2007 21:21]
John Pilotta
This bug is present in the 5.0.27 version. I have used similar subqueries and where it is obvious that the subquery should be run only once (and not per row) the optimizer seems to choose a plan that runs the subquery multiple times resulting in a huge performance penalty. Enclosing the subquery in another trivial SELECT col FROM ( <subquery> ) as described in the bug fixes the problem.
[30 Jan 2007 21:48]
Carl Longnecker
data for test case
Attachment: people.zip (application/x-zip-compressed, text), 51.90 KiB.
[30 Jan 2007 21:50]
Carl Longnecker
i have confirmed that my test case exhibits the same behavior in 5.0.27
[12 Mar 2007 10:01]
Valeriy Kravchuk
Verified just as described with 5.0.37 also. It would be nice for optimizer to rewrite query or just understand that subquery is not dependent at all. I hope it will be fixed in 5.x.y, where x>0. Until that there is a workaround this your report demonstrates.
[16 Mar 2007 5:09]
Stephen Gornick
I just filed Bug #27199, which may be related. http://bugs.mysql.com/bug.php?id=27199
[16 Mar 2007 5:37]
Igor Babaev
The problem will be fixed in 5.2 (when WL #2980 is done).
[23 Jan 2008 1:03]
Sergey Petrunya
This subquery is covered by new subquery materialization optimization (WL#1110) in 6.0. EXPLAIN in 6.0.5: EXPLAIN SELECT Name FROM People WHERE Phone IN (SELECT Phone FROM People GROUP BY Phone HAVING COUNT(*) > 1); +----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+ | 1 | PRIMARY | People | ALL | NULL | NULL | NULL | NULL | 2485 | Using where | | 2 | SUBQUERY | People | index | NULL | phone | 35 | NULL | 2485 | Using index | +----+-------------+--------+-------+---------------+-------+---------+------+------+-------------+ Query execution times: MySQL 6.0.3: 0.03 sec MySQL 6.0.3, with subquery materialization disabled: 28.53 sec. It's 950 times faster now :-) Setting bug status to closed. Carl, thanks for taking time to file. Feel free to re-open the bug if you think resolution is not complete.
[16 Nov 2012 19:22]
Clint Priest
I know this has been fixed for some time for the 6.0 release, is there any way this can be backported to 5.5 and 5.6?

Description: two queries that produce the same result (find people with the same phone number), but the first one takes way too long. SELECT Name FROM People WHERE Phone IN (SELECT Phone FROM People GROUP BY Phone HAVING COUNT(*) > 1) SELECT Name FROM People WHERE Phone IN (SELECT Phone FROM (SELECT Phone FROM People GROUP BY Phone HAVING COUNT(*) > 1) InnerTable) and the explains: +----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+ | 1 | PRIMARY | People | ALL | NULL | NULL | NULL | NULL | 2485 | Using where | | 2 | DEPENDENT SUBQUERY | People | index | NULL | phone | 35 | NULL | 2485 | Using index | +----+--------------------+--------+-------+---------------+-------+---------+------+------+-------------+ +----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+ | 1 | PRIMARY | People | ALL | NULL | NULL | NULL | NULL | 2485 | Using where | | 2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 203 | Using where | | 3 | DERIVED | People | index | NULL | phone | 35 | NULL | 2485 | Using index | +----+--------------------+------------+-------+---------------+-------+---------+------+------+-------------+ based on the numbers, it appears How to repeat: see post #9 here (http://www.sitepoint.com/forums/showthread.php?t=454706) for the sample data i used. note that i hashed the phone number and changed the name to obscure my data, but ti doesn't affect the result. Suggested fix: based on the timing of the query, it looks like in the first version, the subquery is being repeated for each row in the main part of the query. it only needs to be executed once to generate a derived query.