Bug #64557 | Correlated subquery performance is degraded in releases alfter 5.0 | ||
---|---|---|---|
Submitted: | 6 Mar 2012 0:12 | Modified: | 8 Oct 2012 18:33 |
Reporter: | David Berg | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 5.1.56, 5.5.20, 5.6.4 | OS: | Windows (XP Pro SP3) |
Assigned to: | CPU Architecture: | Any | |
Tags: | correlated, Optimizer, performance, regression, subquery |
[6 Mar 2012 0:12]
David Berg
[6 Mar 2012 4:17]
Valeriy Kravchuk
I do not see any attachments in this bug report.
[6 Mar 2012 5:27]
David Berg
The attachments were uploaded to ftp://ftp.oracle.com/support/incoming/bug-data-64557.zip immediately after posting the bug report.
[7 Mar 2012 21:18]
Sveta Smirnova
Thank you for the report. This is fixed in version 5.6.5, but I still set it to "Verified", because it is regression and fix was because improvements in Optimizer code. So devs can consider if they want to fix it in 5.5 too. Results in my environment: 5.0 - 24.38 sec 5.1 - 56 min 28.07 sec 5.5 - 37 min 1.14 sec 5.6 - 14.16sec
[8 Oct 2012 12:40]
Roy Lyseng
In 5.0.97, this query uses a ref access on the IND_MEAS_IDX1 index. In 5.1.67 and 5.5.29, the ref access is replaced with an index access, which causes an excessively high evaluation cost. In 5.6.8, the same ref access is used as in 5.0. I have not gone into any detail about what causes these changes, but I consider this bug fixed in 5.6. Notice that use of GROUP BY and HAVING excludes semi-join transformation and use of outer reference in WHERE clause excludes subquery materialization. Workaround without GROUP BY. --------------------------- The query can be rewritten without a GROUP BY clause. It may then look like this: SELECT project_id, project_name, indicator_name, ind_meas_date, ind_meas_measure, ind_meas_comment FROM PROJECTS p, INDICATORS i, INDICATOR_MEASURES m WHERE m.indicator_id = i.indicator_id AND i.internal_project_id = p.internal_project_id AND 1 IN (SELECT 1 FROM INDICATOR_MEASURES AS im WHERE i.indicator_id=indicator_id AND p.internal_project_id=internal_project_id HAVING COUNT(*) > 1 ) ORDER BY project_name, indicator_name, ind_meas_date DESC; This query is faster in all releases (0.29 seconds in 5.0.97 and 0.36 seconds in 5.6.8), and uses ref access on the INDICATOR_ID index. Workaround for materialization. ------------------------------ MySQL 5.6 supports materialization of IN subqueries, which may be taken advantage of by rewriting the query like this: SELECT project_id, project_name, indicator_name, ind_meas_date, ind_meas_measure, ind_meas_comment FROM PROJECTS p, INDICATORS i, INDICATOR_MEASURES m WHERE m.indicator_id = i.indicator_id AND i.internal_project_id = p.internal_project_id AND (i.indicator_id, p.internal_project_id) IN (SELECT indicator_id, internal_project_id FROM INDICATOR_MEASURES AS im GROUP BY indicator_id, internal_project_id HAVING COUNT(*) > 1 ) ORDER BY project_name, indicator_name, ind_meas_date DESC; Notice that there are no outer references in the rewritten subquery. This is a pre-requisite for materialization to be used. In MySQL 5.6, this query uses 0.27 seconds, which is the fastest of all execution times.
[8 Oct 2012 18:33]
David Berg
Roy's suggested workaround without GROUP BY executes in 0.499 seconds on 5.5.24 on my now-Windows 7 platform. Thank you!