Bug #51018 | Batched key access gives wrong results for SELECT with semijoin and NULL | ||
---|---|---|---|
Submitted: | 9 Feb 2010 9:19 | Modified: | 23 Nov 2010 3:15 |
Reporter: | Guilhem Bichot | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 6.0-codebase-bugfixing | OS: | Any |
Assigned to: | Guilhem Bichot | CPU Architecture: | Any |
Tags: | BKA, join_cache_level, materialization, optimizer_switch, semijoin |
[9 Feb 2010 9:19]
Guilhem Bichot
[9 Feb 2010 9:49]
Sveta Smirnova
Thank you for the report. Verified as described.
[11 Feb 2010 15:30]
Guilhem Bichot
Probably the same thing is visible by doing a diff of the existing subselect3.result and subselect3_jcl6.result, I see: @@ -1069,6 +1073,7 @@ t22.a in (select t12.a from t11, t12 where t11.a in(255,256) and t11.a = t12.a and t11.c is null) and t22.c is null order by t21.a; a b c 256 67 NULL +256 67 NULL drop table t1, t11, t12, t21, t22; create table t1(a int); insert into t1 values (0),(1); Please make sure to fix both my reduced testcase and this.
[30 May 2010 19:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109560 3183 Guilhem Bichot 2010-05-30 Fix for BUG#50358 "semijoin execution of subquery with outerjoin yields wrong result" and BUG#51018 "Batched key access gives wrong results for SELECT with semijoin and NULL": semijoin materialization was not fully disabled when doing join buffering. This is the first possible fix, it has the advantage of changing little code, and the disadvantage of adding some loops which in total scale like 0.5*(N^2) where N is the number of tables in the join (exception: no added cost if there are no semijoins in the query). Another fix with a different approach will be submitted next. @ mysql-test/r/subselect3.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect3_jcl6.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest. The test where one 256,67,NULL line disappears, is exactly the test for BUG#51018. @ mysql-test/r/subselect4.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect_sj.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect_sj2.result result for test @ mysql-test/r/subselect_sj2_jcl6.result before this fix, the result of SELECT would be 2 and 2 which is wrong @ mysql-test/r/subselect_sj_jcl6.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/t/subselect_sj2.test test for BUG#51018 @ sql/sql_select.cc In check_join_cache_usage(), a TODO comment reflects that we don't support join buffering in semijoin inner tables handled with materialization. This was already enforced with test sj_is_materialize_strategy(join->best_positions[i].sj_strategy); but, as the strategy is SJ_OPT_NONE for all non-first inner tables, this test let non-first inner tables do materialization. That led to wrong results. The fix is to properly enforce the limitation: scan previous JOIN_TABs until finding a first inner table with materialization strategy. We do this search only if there are semijoin nests. Another existing limitation, for FirstMatch, is groupped under the same if(), as an optimization.
[30 May 2010 19:44]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/109561 3183 Guilhem Bichot 2010-05-30 Fix for BUG#50358 "semijoin execution of subquery with outerjoin yields wrong result" and BUG#51018 "Batched key access gives wrong results for SELECT with semijoin and NULL": semijoin materialization was not fully disabled when doing join buffering. This is the second possible fix, which stores more in order to compute less. Test results are identical to the ones in the first fix. @ sql/sql_select.cc Semijoin materializations now set JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab. In check_join_cache_usage(), we want to disable join buffering if the table is in semijoin materialization (see the TODO in the function's comment). The semijoin strategy is however stored only in the first semijoin inner table, other inner ones have SJ_OPT_NONE (see end of fix_semijoin_strategies_for_picked_join_order()). So when check_join_cache_usage() was looking at a non-first inner table, sj_is_materialize_strategy() said "no", in the end join buffering was not disabled for that non-first inner table, leading to wrong results. To find the strategy for the table, we need to access the first inner table: we can now do so with tab->first_sj_inner_tab (now filled), through get_sj_strategy(). All existing code which implicitely applied only to firstmatch inner tables, still does so. @ sql/sql_select.h JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab were set only for firstmatch. As the same information is useful for check_join_cache_usage() to detect semijoin materialization, we now set it for this strategy too. Thus, existing code which tested first_sj_inner_tab (implicitely testing for firstmatch) should now test first_sj_inner_tab_with_firstmatch() to not change behaviour.
[4 Jun 2010 12:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110234 3183 Guilhem Bichot 2010-06-04 Fix for BUG#50358 "semijoin execution of subquery with outerjoin yields wrong result" and BUG#51018 "Batched key access gives wrong results for SELECT with semijoin and NULL": semijoin materialization was not fully disabled when doing join buffering. This is the second possible fix, which stores more in order to compute less. Test results are identical to the ones in the first fix. This contains more changes, suggested by first reviewer. @ sql/sql_join_cache.cc * JOIN_TAB::use_match_flag(), is_last_inner_table(), get_first_inner_table() coded in-line in JOIN_CACHE, as they are quite specific to JOIN_CACHE * cache value of check_only_first_match @ sql/sql_select.cc * "join->join_tab + i + pos->n_sj_tables - 1" was used in many places in setup_semijoin_dups_elimination(), computing it once now * All semijoin strategies now set JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab. In check_join_cache_usage(), we want to disable join buffering if the table is in semijoin materialization (see the TODO in the function's comment). The semijoin strategy is however stored only in the first semijoin inner table, other inner ones have SJ_OPT_NONE (see end of fix_semijoin_strategies_for_picked_join_order()). So when check_join_cache_usage() was looking at a non-first inner table, sj_is_materialize_strategy() said "no", in the end join buffering was not disabled for that non-first inner table, leading to wrong results. To find the strategy for the table, we need to access the first inner table: we can now do so with tab->first_sj_inner_tab (now filled), through get_sj_strategy(). All existing code which implicitely applied only to firstmatch inner tables, still does so, because a test for get_sj_strategy()==SJ_OPT_FIRST_MATCH is added @ sql/sql_select.h * JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab were set only for firstmatch. As the same information is useful for check_join_cache_usage() to detect semijoin materialization, we now set it for all semijoin strategies. A member function JOIN_TAB::get_sj_strategy() gives the table's semijoin strategy. Thus, existing code which used first_sj_inner_tab (implicitely testing for firstmatch) should now additionally test get_sj_strategy()==SJ_OPT_FIRST_MATCH, to not change behaviour. * is_inner_table_of_semi_join_with_first_match() is removed as it's one-line, not very general and replacable by get_sj_strategy()==SJ_OPT_FIRST_MATCH. * is_single_inner_of_semi_join_with_first_match() is replaced by is_single_inner_of_semi_join() (more general). * certain member functions used only by JOIN_CACHE, and quite specific of it (use_match_flag(),check_only_first_match()) are either removed and inserted in-line in JOIN_CACHE's code, or moved to being JOIN_CACHE member functions: that's the case for check_only_first_match(). Same for is_last_inner_table() and get_first_inner_table(), which had misleading names (they apply only to semijoin inner tables served by FirstMatch, not all semijoin inner tables); this behaviour being caused by JOIN_CACHE, it's better if those specifics are inside JOIN_CACHE code and not in JOIN_TAB (putting them in JOIN_TAB suggested that they were generic-purpose, reusable functions). * Additionally, we add JOIN_CACHE::check_only_first_match, a cached value to avoid computing its value for each record (see sql_join_cache.cc)
[7 Jun 2010 15:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110369 3183 Guilhem Bichot 2010-06-07 Fix for BUG#50358 "semijoin execution of subquery with outerjoin yields wrong result" and BUG#51018 "Batched key access gives wrong results for SELECT with semijoin and NULL": semijoin materialization was not fully disabled when doing join buffering. This is the second possible fix, which stores more in order to compute less. Test results are identical to the ones in the first fix. This contains more changes, suggested by first reviewer. @ sql/sql_join_cache.cc * JOIN_TAB::use_match_flag(), is_last_inner_table(), get_first_inner_table() coded in-line in JOIN_CACHE, as they are quite specific to JOIN_CACHE * cache value of check_only_first_match @ sql/sql_select.cc * "join->join_tab + i + pos->n_sj_tables - 1" was used in many places in setup_semijoin_dups_elimination(), computing it once now * All semijoin strategies now set JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab. In check_join_cache_usage(), we want to disable join buffering if the table is in semijoin materialization (see the TODO in the function's comment). The semijoin strategy is however stored only in the first semijoin inner table, other inner ones have SJ_OPT_NONE (see end of fix_semijoin_strategies_for_picked_join_order()). So when check_join_cache_usage() was looking at a non-first inner table, sj_is_materialize_strategy() said "no", in the end join buffering was not disabled for that non-first inner table, leading to wrong results. To find the strategy for the table, we need to access the first inner table: we can now do so with tab->first_sj_inner_tab (now filled), through get_sj_strategy(). All existing code which implicitely applied only to firstmatch inner tables, still does so, because a test for get_sj_strategy()==SJ_OPT_FIRST_MATCH is added @ sql/sql_select.h * JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab were set only for firstmatch. As the same information is useful for check_join_cache_usage() to detect semijoin materialization, we now set it for all semijoin strategies. A member function JOIN_TAB::get_sj_strategy() gives the table's semijoin strategy. Thus, existing code which used first_sj_inner_tab (implicitely testing for firstmatch) should now additionally test get_sj_strategy()==SJ_OPT_FIRST_MATCH, to not change behaviour. * is_inner_table_of_semi_join_with_first_match() is removed as it's one-line, not very general and replacable by get_sj_strategy()==SJ_OPT_FIRST_MATCH. * is_single_inner_of_semi_join_with_first_match() is replaced by is_single_inner_of_semi_join() (more general). * certain member functions used only by JOIN_CACHE, and quite specific of it (use_match_flag(),check_only_first_match()) are either removed and inserted in-line in JOIN_CACHE's code, or moved to being JOIN_CACHE member functions: that's the case for check_only_first_match(). Same for is_last_inner_table() and get_first_inner_table(), which had misleading names (they apply only to semijoin inner tables served by FirstMatch, not all semijoin inner tables); this behaviour being caused by JOIN_CACHE, it's better if those specifics are inside JOIN_CACHE code and not in JOIN_TAB (putting them in JOIN_TAB suggested that they were generic-purpose, reusable functions). * Additionally, we add JOIN_CACHE::check_only_first_match, a cached value to avoid computing its value for each record (see sql_join_cache.cc)
[11 Jun 2010 12:28]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110822 3192 Guilhem Bichot 2010-06-11 Fix for BUG#50358 "semijoin execution of subquery with outerjoin yields wrong result" and BUG#51018 "Batched key access gives wrong results for SELECT with semijoin and NULL": semijoin materialization was not fully disabled when doing join buffering. @ mysql-test/r/subselect3.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect3_jcl6.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest. The test where one 256,67,NULL line disappears, is exactly the test for BUG#51018. @ mysql-test/r/subselect4.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect_sj.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/r/subselect_sj2.result result for test @ mysql-test/r/subselect_sj2_jcl6.result before this fix, the result of SELECT would be 2 and 2 which is wrong @ mysql-test/r/subselect_sj2_jcl7.result result for test @ mysql-test/r/subselect_sj_jcl6.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest. The test where 1 changes to 2 is another symptom of BUG#50358. @ mysql-test/r/subselect_sj_jcl7.result those plans accidentally had join buffering enabled for the non-first inner tables in a semijoin materialization nest @ mysql-test/t/subselect_sj2.test test for BUG#51018 @ sql/sql_join_cache.cc * JOIN_TAB::use_match_flag(), is_last_inner_table(), get_first_inner_table() coded in-line in JOIN_CACHE, as they are quite specific to JOIN_CACHE * cache value of check_only_first_match @ sql/sql_select.cc * "join->join_tab + i + pos->n_sj_tables - 1" was used in many places in setup_semijoin_dups_elimination(), computing it once now * All semijoin strategies now set JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab. In check_join_cache_usage(), we want to disable join buffering if the table is in semijoin materialization (see the TODO in the function's comment). The semijoin strategy is however stored only in the first semijoin inner table, other inner ones have SJ_OPT_NONE (see end of fix_semijoin_strategies_for_picked_join_order()). So when check_join_cache_usage() was looking at a non-first inner table, sj_is_materialize_strategy() said "no", in the end join buffering was not disabled for that non-first inner table, leading to wrong results. To find the strategy for the table, we need to access the first inner table: we can now do so with tab->first_sj_inner_tab (now filled), through get_sj_strategy(). All existing code which implicitely applied only to firstmatch inner tables, still does so, because a test for get_sj_strategy()==SJ_OPT_FIRST_MATCH is added @ sql/sql_select.h * JOIN_TAB::first_sj_inner_tab and JOIN_TAB::last_sj_inner_tab were set only for firstmatch. As the same information is useful for check_join_cache_usage() to detect semijoin materialization, we now set it for all semijoin strategies. A member function JOIN_TAB::get_sj_strategy() gives the table's semijoin strategy. Thus, existing code which used first_sj_inner_tab (implicitely testing for firstmatch) should now additionally test get_sj_strategy()==SJ_OPT_FIRST_MATCH, to not change behaviour. * is_inner_table_of_semi_join_with_first_match() is removed as it's one-line, not very general and replacable by get_sj_strategy()==SJ_OPT_FIRST_MATCH. * is_single_inner_of_semi_join_with_first_match() is replaced by is_single_inner_of_semi_join() (more general). * certain member functions used only by JOIN_CACHE, and quite specific of it (use_match_flag(),check_only_first_match()) are either removed and inserted in-line in JOIN_CACHE's code, or moved to being JOIN_CACHE member functions: that's the case for check_only_first_match(). Same for is_last_inner_table() and get_first_inner_table(), which had misleading names (they apply only to semijoin inner tables served by FirstMatch, not all semijoin inner tables); this behaviour being caused by JOIN_CACHE, it's better if those specifics are inside JOIN_CACHE code and not in JOIN_TAB (putting them in JOIN_TAB suggested that they were generic-purpose, reusable functions). * Additionally, we add JOIN_CACHE::check_only_first_match, a cached value to avoid computing its value for each record (see sql_join_cache.cc)
[11 Jun 2010 12:29]
Guilhem Bichot
queued to next-mr-opt-backporting
[16 Aug 2010 6:41]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:27]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:15]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.