Bug #53298 | wrong result with semijoin (no semijoin strategy chosen) | ||
---|---|---|---|
Submitted: | 29 Apr 2010 21:15 | Modified: | 23 Nov 2010 3:28 |
Reporter: | Guilhem Bichot | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0-codebase-bugfixing | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | optimizer_switch, semijoin |
[29 Apr 2010 21:15]
Guilhem Bichot
[29 Apr 2010 23:29]
MySQL Verification Team
Thank you for the bug report. Verified as described.
[30 Apr 2010 9:04]
Guilhem Bichot
maybe not related to BUG#31480, at least the patch submitted in that bug report doesn't help here.
[4 May 2010 20:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/107392 3863 Guilhem Bichot 2010-05-04 Fix for BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)". Reviewers: please see questions marked with "QQ" @ mysql-test/r/subselect3.result fix causes slight plan change @ mysql-test/r/subselect4.result before the code fix, the final select would return two rows (1 and 1). @ mysql-test/r/subselect_sj2.result Before the code fix, EXPLAIN showed a wrong plan: adding two Australian towns to table t1, each of them having a population greater than 5MB, led the SELECT to return Australia twice (duplicate); after this fix, Australia would be returned once. @ mysql-test/t/subselect4.test test for bug @ sql/sql_select.cc In the bug's scenario, t3 and t4 are inner tables of the semijoin. t4 is pulled out (functionally dependent), but t3 is not. As we thus have one inner table left, we need a semijoin strategy; without semijoin strategy, the semijoin just becomes a plain join, leading to duplicate rows. Why didn't we have a semijoin strategy: - plan search would first choose this order: t4,t3,t1,t2, with duplicate weedout as semijoin strategy. As this handles the semijoin fully, join->cur_dups_producing_tables is 0 (as a result of choosing duplicates weedout in advance_sj_state() for plan's last table t2). - then plan search evaluates plan t4,t3,t2,t1: it first computes its cost, then in avdance_sj_state(), as join->cur_dups_producing_tables is still 0 (the bug), it believes that semijoin inner tables have been handled already, so doesn't feel obliged to choose a semijoin strategy, so this plan has no strategy, and as its cost is lower than that of t4,t3,t1,t2, it is chosen. A plan without a strategy, thus duplicates. The fix: as in advance_sj_state() we modify certain JOIN members, we need to restore them when we back-track to other plans. backout_nj_sj_state() already restored cur_sj_inner_tables, but forgot to restore cur_dups_producing_tables. We here make a class which contains all that backout_nj_sj_state() should restore.
[14 May 2010 11:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/108313 3863 Guilhem Bichot 2010-05-14 Fix for BUG#53298 "wrong result with semijoin (no semijoin strategy chosen)". @ mysql-test/r/subselect3.result Without the fix, we were comparing the cost of first-considered plan "t11,t12,t21,t22" (with materialization scan for (t11,t12)) with the cost of partial plan "t11,t12,t22" (inner joins). Yes, inner joins, because when we considered that partial plan, cur_dups_producing_tables was 0 (had not been reset, which is the bug): this said that no semijoin strategy was needed so inner join was used. "t11,t12,t22" was then pruned by the heuristic of optimizer_prune_level=1, because it had a higher record count. With the fix, when considering "t11,t12,t22" we correctly realize that this needs a semijoin strategy, choose materialization scan, and go down to "t11,t12,t22,t21", which is less costly than "t11,t12,t21,t22" (11.986 vs 12). Hence the plan change. @ mysql-test/r/subselect4.result before the code fix, the final select would return two rows (1 and 1). @ mysql-test/r/subselect_sj2.result Before the code fix, EXPLAIN showed a wrong plan: adding two Australian towns to table t1, each of them having a population greater than 5MB, led the SELECT to return Australia twice (duplicate); after this fix, Australia would be returned once. @ mysql-test/t/subselect4.test test for bug @ sql/sql_select.cc In the bug's scenario, t3 and t4 are inner tables of the semijoin. t4 is pulled out (functionally dependent), but t3 is not. As we thus have one inner table left, we need a semijoin strategy; without semijoin strategy, the semijoin just becomes a plain join, leading to duplicate rows. Why didn't we have a semijoin strategy: - plan search would first choose this order: t4,t3,t1,t2, with duplicate weedout as semijoin strategy. As this handles the semijoin fully, join->cur_dups_producing_tables is 0 (as a result of choosing duplicates weedout in advance_sj_state() for plan's last table t2). - then plan search evaluates plan t4,t3,t2,t1: it first computes its cost, then in avdance_sj_state(), as join->cur_dups_producing_tables is still 0 (the bug), it believes that semijoin inner tables have been handled already, so doesn't feel obliged to choose a semijoin strategy, so this plan has no strategy, and as its cost is lower than that of t4,t3,t1,t2, it is chosen. A plan without a strategy, thus duplicates. The fix: move JOIN::cur_dups_producing_tables to POSITION (which tracks the optimization state at the partial plan's stage); as the function comment of advance_sj_state() says, "no undo is necessary" then. When we are about to look at the first table, we don't have duplicate-producing tables yet (=0); when we go to table N+1, we inherit the same duplicate- producing tables (=pos[-1].etc), and we add more tables if we hit a semijoin nest, and we remove tables when we pick a semijoin strategy. @ sql/sql_select.h move JOIN::cur_dups_producing_tables to POSITION
[20 May 2010 10:04]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100520100225-oe4iuu5kuzsx0knq) (version source revid:alik@sun.com-20100520100057-rmn5y3o3ij726bm7) (merge vers: 6.0.14-alpha) (pib:16)
[21 May 2010 14:18]
Guilhem Bichot
this has reached 6.0-codebase as well as the next-mr-opt-backporting tree.
[21 May 2010 14:20]
Guilhem Bichot
"For some queries having subqueries in the WHERE clause (semijoin), the semijoin was wrongly transformed into an inner join leading to excess rows in the result"
[22 May 2010 14:20]
Paul DuBois
Noted in 6.0.14 changelog.
[16 Aug 2010 6:39]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:14]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:28]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.