Bug #54437 | Extra rows with LEFT JOIN + semijoin (firstmatch and duplicates weedout) | ||
---|---|---|---|
Submitted: | 11 Jun 2010 16:55 | Modified: | 23 Nov 2010 3:33 |
Reporter: | Guilhem Bichot | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | next-mr-opt-backporting | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | optimizer_switch, semijoin |
[11 Jun 2010 16:55]
Guilhem Bichot
[11 Jun 2010 17:38]
MySQL Verification Team
Thank you for the bug report. mysql> select t3.a from t2 left join t3 on t2.a=t3.a; +------+ | a | +------+ | NULL | | NULL | +------+ 2 rows in set (0.00 sec) mysql> select * from t1 where t1.a in (select t2.a from t2 left join t3 on t2.a=t3.a); +------+ | a | +------+ | 1 | | 1 | +------+ 2 rows in set (0.00 sec) mysql> mysql> show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | innodb_version | 1.0.6 | | protocol_version | 10 | | slave_type_conversions | | | version | 5.6.99-m4-debug | | version_comment | Source distribution | | version_compile_machine | x86_64 | | version_compile_os | Linux | +-------------------------+---------------------+ 7 rows in set (0.00 sec) mysql>
[28 Jun 2010 9:23]
Guilhem Bichot
evaluate_join_record() has code to handle firstmatch and duplicates weedout, apparently evaluate_null_complemented_join_record() doesn't. One elegant solution is to make the latter, after it has built the NULL-complemented fields, call evaluate_join_record() to validate it instead of directly calling next_select(). This change seems to fix this bug but introduce others. To better understand this, I'm researching how the "first_unmatched", "last_inner", "first_upper" objects work.
[28 Jun 2010 20:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/112384 3208 Guilhem Bichot 2010-06-28 Fix for Bug#54437 "Extra rows with LEFT JOIN + semijoin (firstmatch and duplicates weedout)": convergence between evaluate_join_record() and evaluate_null_complemented_record(). @ mysql-test/include/subquery_sj.inc Without the code fix, main.subquery_sj_dupsweed, main.subquery_sj_firstmatch and main.subquery_sj_loosescan would return 4 1's instead of 2. There was no problem at jcl6/7 because the recent fix for BUG 54235 had already aligned JOIN_CACHE::join_null_complements() with JOIN_CACHE::join_matching_records(). @ sql/sql_select.cc During execution, Firstmatch and Duplicates weedout are handled in evaluate_join_record() (function which applies to matching records), not in evaluate_null_complemented_record() (function which applies to non-matching records). Thus, if the semijoin inner expression is a LEFT JOIN yielding NULLs (see subquery_sj.inc: t3 columns are filled with NULLs), evaluate_null_complemented_record() for t3 goes to the final end_send(), without any firstmatch or duplicates weedout treatment, so duplicate rows are returned. The easy and safe fix would be to put in evaluate_null_complemented_record() a copy of the code of evaluate_join_record() which handles firstmatch and duplicates weedout. In an attempt to reduce code size and unify execution flows, the chosen fix is rather to make evaluate_null_complemented_record() call evaluate_join_record(). The first function is responsible for creating NULL values for all inner tables at the right of LEFT JOIN, and for testing pushed down conditions, from then on this record is sent to the second function for more complete evaluation of it (firstmatch etc). This should minimize future distortion between the two functions. Note that the loop over first_unmatched, in evaluate_null_complemented_record(), which this patch shortens to its first iteration, is already present in evaluate_join_record().
[1 Jul 2010 15:58]
Guilhem Bichot
Jorgen found that with the patch, this testcase: create table t1 (a int); create table t2 (a int); create table t3 (a int); insert into t1 values(1); insert into t1 values(1); insert into t2 values(1); insert into t2 values(1); insert into t3 values(2); insert into t3 values(2); let $query=select * from t1 where t1.a in (select t2.a from t2 left join (t2 as t2inner,t3) on t2.a=t3.a); eval explain $query; eval $query; returns one row instead of two (without the patch it returned four rows, which is wrong too). I have debugged that case and it seems to be the same ingredients as BUG#49129 (join buffering, optimizer having a wrong idea of what table will do join buffering).
[27 Jul 2010 9:50]
Guilhem Bichot
Jorgen's testcase is about independent bugs, see thread at http://lists.mysql.com/commits/113000 . So I'm setting back this patch to "patch pending" for review.
[12 Aug 2010 9:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/115549 3226 Guilhem Bichot 2010-08-12 Fix for Bug#54437 "Extra rows with LEFT JOIN + semijoin (firstmatch and duplicates weedout)": convergence between evaluate_join_record() and evaluate_null_complemented_record(). @ mysql-test/include/subquery_sj.inc Without the code fix, main.subquery_sj_dupsweed, main.subquery_sj_firstmatch and main.subquery_sj_loosescan would return 4 1's instead of 2. There was no problem at jcl6/7 because the recent fix for BUG 54235 had already aligned JOIN_CACHE::join_null_complements() with JOIN_CACHE::join_matching_records(). @ sql/sql_select.cc During execution, Firstmatch and Duplicates weedout are handled in evaluate_join_record() (function which applies to matching records), not in evaluate_null_complemented_record() (function which applies to non-matching records). Thus, if the semijoin inner expression is a LEFT JOIN yielding NULLs (see subquery_sj.inc: t3 columns are filled with NULLs), evaluate_null_complemented_record() for t3 goes to the final end_send(), without any firstmatch or duplicates weedout treatment, so duplicate rows are returned. The easy and safe fix would be to put in evaluate_null_complemented_record() a copy of the code of evaluate_join_record() which handles firstmatch and duplicates weedout. In an attempt to reduce code size and unify execution flows, the chosen fix is rather to make evaluate_null_complemented_record() call evaluate_join_record(). The first function is responsible for creating NULL values for all inner tables at the right of LEFT JOIN, and for testing pushed down conditions, from then on this record is sent to the second function for more complete evaluation of it (firstmatch etc). This should minimize future distortion between the two functions. Note that the loop over first_unmatched, in evaluate_null_complemented_record(), which this patch shortens to its first iteration, is already present in evaluate_join_record().
[12 Aug 2010 9:08]
Guilhem Bichot
queued to next-mr-opt-backporting
[26 Aug 2010 7:52]
John Embretsen
See Bug#56254 (Assertion tab->ref.use_count fails in join_read_key_unlock_row() on 4-way JOIN) for an issue that may have been caused by the fix for this bug.
[2 Oct 2010 18:15]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101002181053-6iotvl26uurcoryp) (version source revid:alexander.nozdrin@oracle.com-20101002180917-h0n62akupm3z20nt) (pib:21)
[13 Nov 2010 16:24]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:33]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.