Bug #31480 | Incorrect result for nested subquery when executed via semi join | ||
---|---|---|---|
Submitted: | 9 Oct 2007 13:29 | Modified: | 2 Mar 2011 3:28 |
Reporter: | Timour Katchaounov | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 6.0-{main, opt} | OS: | Any |
Assigned to: | Roy Lyseng | CPU Architecture: | Any |
Tags: | Contribution, optimizer_switch, semijoin, subquery |
[9 Oct 2007 13:29]
Timour Katchaounov
[16 Nov 2007 17:19]
Sergey Petrunya
Investigation results ===================== The cause of the problem is that the inner (non-flattened) subquery has incorrect table_map().
[6 Jul 2009 9:45]
Sergey Petrunya
The patch is available here: https://code.launchpad.net/~maria-captains/maria/mysql-6.0-testing2
[10 Sep 2009 7:58]
Sergei Golubchik
http://lists.mysql.com/internals/37295
[17 Mar 2010 9:19]
Jørgen Løland
SergeyP has submitted the patch under SCA: http://lists.mysql.com/internals/37804 Thank you Sergey.
[10 Jun 2010 8:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/110690 3189 Roy Lyseng 2010-06-10 Bug#31480: Incorrect result for nested subquery when executed via semijoin This problem may occur when we have a query containing at least two nested subqueries; one subquery is transformed into a semijoin, and a subquery that is inner to the transformed subquery (and is not itself transformed away) contains outer references either to the transformed subquery or to it's immediate outer query block. When this situation happens, the transformed query and it's immediate outer query block is consolidated into one query block. In this process, involved tables may be renumbered, and resolved information for the inner expressions may change. Example with a query: select t1.a from t1 where t1.a in (select t2.c from t2 where t2.d >= some(select t3.e from t3 where t1.b=t3.e)); Graph of select_lex objects and table objects for this query, before transformation: A - t1 \ B - t2 \ C - t3 For shorthand, we denote the IN predicate the "outer subquery predicate" and the quantified comparison predicate the "inner subquery predicate". The outer subquery predicate contains select_lex B and the inner subquery predicate contains select_lex C. Now, semijoin transformation is applied to the outer subquery predicate. Here is the query graph after transformation: A - t1, t2 \ C - t3 After the transformation, used_tables information is wrong for the column t1.b. The problem is fixed by adding fix_after_pullout() functions for the subquery objects (Item_subselect, Item_in_subselect, Item_in_optimizer) that can propagate resolution of expressions into subqueries that have expressions that are affected by the semijoin transformation. In addition, fix_after_pullout() is extended with another argument that identifies the select_lex object that is being removed. With the current use of fix_after_pullout(), this information is implicit, but it has to be explicit when also handling fields that are referenced within inner subqueries. The new fix_after_pullout() interface then becomes: fix_after_pullout(st_select_lex *parent_select, st_select_lex *removed_select, Item **ref); fix_after_pullout() applied to an Item_field object only needs to handle cases where the depended_from or context->select_lex is equal to either parent_select or removed_select. If the field belongs to a table that is "inner" or "outer" compared to the select_lex level we are transforming, then no information needs to be updated. Note also that information about correlated outer columns must be recorded directly in a subquery predicate object. Suppose we have two select_lex objects A and B. B is part of a subquery that is evaluated on level A. The subquery has a predicate that contains a reference to a column that is resolved to a table t1 that belongs to select_lex A. In order to properly evaluate the subquery predicate, we need to make sure that a row from t1 is present. Thus, the fix_after_pullout() function for Item_field needs to propagate information about the table it is resolved within to an outer subquery predicate, if that predicate is evaluated on the same level as the table t1 is resolved on. In addition, all subquery predicate objects found between the select_lex where the field reference is found and the select_lex where the field reference is resolved, except the upper predicate mentioned above, must be marked as outer correlated by setting OUTER_REF_TABLE_BIT. mysql-test/r/subselect_sj.result Added result for test case for Bug#31480 mysql-test/r/subselect_sj_jcl6.result Added result for test case for Bug#31480 mysql-test/t/subselect_sj.test Added test case for Bug#31480 sql/item.cc Item_field::resolved_used_tables() is implemented. Item_field::fix_after_pullout() is changed so that it behaves properly also for fields that are used in a scope that is inner to the select_lex being removed. Also update used_tables information for subqueries that are outer-correlated with respect to this field. Updated fix_after_pullout() in conformance with new interface. sql/item.h Changed interface of fix_after_pullout(). Added interface for resolved_used_tables(). This function returns used table information for the level on which a field's table is resolved, in contrast to used_tables() which returns OUTER_REF_TABLE_BIT if the field is an outer reference. sql/item_cmpfunc.cc Necessary changes to fix_after_pullout(). Item_in_optimizer::fix_after_pullout() resolves itself by dispatching to the wrapped Item_in_subselect object. sql/item_cmpfunc.h Added fix_after_pullout() to class Item_in_optimizer. sql/item_func.cc Necessary changes to fix_after_pullout(). sql/item_func.h Changed interface of fix_after_pullout(). sql/item_row.cc Necessary changes to fix_after_pullout(). sql/item_row.h Changed interface of fix_after_pullout(). sql/item_subselect.cc Item_subselect::fix_after_pullout() resolves all expressions referred from the inner query specification objects (select_lex objects). Item_in_subselect::fix_after_pullout() dispatches the query expression resolution to Item_subselect::fix_after_pullout(), but needs to resolve its left expression explicitly. sql/item_subselect.h Added fix_after_pullout() to classes Item_subselect and Item_in_subselect. sql/sql_select.cc Interface change to fix_list_after_tbl_changes(). Also make sure that transformed-away select_lex object is removed before calling fix_after_pullout().
[14 Oct 2010 8:19]
Roy Lyseng
Setting "approved" on behalf of Evgeny because of this mail: Hello Roy, Evgeny is asking me to tell you this information: you can push bug #31480 with Jorgen's changes. (Evgeny is without internet connection ATM). Thank you, Gleb.
[15 Oct 2010 10:34]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/120822 3263 Roy Lyseng 2010-10-15 Bug#31480: Incorrect result for nested subquery when executed via semijoin This problem may occur when we have a query containing at least two nested subqueries; one subquery is transformed into a semijoin, and a subquery that is inner to the transformed subquery (and is not itself transformed away) contains outer references either to the transformed subquery or to it's immediate outer query block. When this situation happens, the transformed query and it's immediate outer query block is consolidated into one query block. In this process, involved tables may be renumbered, and resolved information for the inner expressions may change. Example with a query: select t1.a from t1 where t1.a in (select t2.c from t2 where t2.d >= some(select t3.e from t3 where t1.b=t3.e)); Graph of select_lex objects and table objects for this query, before transformation: A - t1 \ B - t2 \ C - t3 For shorthand, we denote the IN predicate the "outer subquery predicate" and the quantified comparison predicate the "inner subquery predicate". The outer subquery predicate contains select_lex B and the inner subquery predicate contains select_lex C. Now, semijoin transformation is applied to the outer subquery predicate. Here is the query graph after transformation: A - t1, t2 \ C - t3 After the transformation, used_tables information is wrong for the column t1.b. The problem is fixed by adding fix_after_pullout() functions for the subquery objects (Item_subselect, Item_in_subselect, Item_in_optimizer), that can propagate re-resolution down to subqueries that have expressions that are affected by the semijoin transformation. In addition, fix_after_pullout() is extended with another argument that identifies the select_lex object that is being removed. With the current use of fix_after_pullout(), this information is implicit, but it has to be explicit when also handling fields that are referenced within inner subqueries. The new fix_after_pullout() interface then becomes: fix_after_pullout(st_select_lex *parent_select, st_select_lex *removed_select, Item **ref); fix_after_pullout() applied to an Item_field object only needs to handle cases where the depended_from or context->select_lex is equal to either parent_select or removed_select. If the field belongs to a table that is "inner" or "outer" compared to the select_lex level we are transforming, then no information needs to be updated. Note also that information about correlated outer columns must be recorded directly in a subquery predicate object. Suppose we have two select_lex objects A and B. B is part of a subquery that is evaluated on level A. The subquery has a predicate that contains a reference to a column that is resolved to a table t1 that belongs to select_lex A. In order to properly evaluate the subquery predicate, we need to make sure that a row from t1 is present. Thus, the fix_after_pullout() function for Item_field needs to propagate information about the table it is resolved within to an outer subquery predicate, if that predicate is evaluated on the same level as the table t1 is resolved on. mysql-test/r/optimizer_switch.result Added result for test case for Bug#31480 mysql-test/r/subquery_sj_all.result mysql-test/r/subquery_sj_all_jcl6.result mysql-test/r/subquery_sj_all_jcl7.result mysql-test/r/subquery_sj_dupsweed.result mysql-test/r/subquery_sj_dupsweed_jcl6.result mysql-test/r/subquery_sj_dupsweed_jcl7.result mysql-test/r/subquery_sj_firstmatch.result mysql-test/r/subquery_sj_firstmatch_jcl6.result mysql-test/r/subquery_sj_firstmatch_jcl7.result mysql-test/r/subquery_sj_loosescan.result mysql-test/r/subquery_sj_loosescan_jcl6.result mysql-test/r/subquery_sj_loosescan_jcl7.result mysql-test/r/subquery_sj_mat.result mysql-test/r/subquery_sj_mat_jcl6.result mysql-test/r/subquery_sj_mat_jcl7.result Minor plan change (WHERE clause moved from first to second table). The reason for this is that the IN subquery predicate object is updated with correct used tables information. mysql-test/t/optimizer_switch.test Added test case for Bug#31480 Notice that the test case with an inner grouped subquery is wrong when semijoin optimization is turned on. sql/item.cc Item_field::resolved_used_tables() is implemented. Item_field::fix_after_pullout() is changed so that it behaves properly also for fields that are used in a scope that is inner to the select_lex being removed. Also update used_tables information for subqueries that are outer-correlated with respect to this field. Updated fix_after_pullout() in conformance with new interface. sql/item.h Changed interface of fix_after_pullout(). Added interface for resolved_used_tables(). This function returns used table information for the level on which a field's table is resolved, in contrast to used_tables() which returns OUTER_REF_TABLE_BIT if the field is an outer reference. sql/item_cmpfunc.cc Necessary changes to fix_after_pullout(). Item_in_optimizer::fix_after_pullout() resolves itself by dispatching to the wrapped Item_in_subselect object. sql/item_cmpfunc.h Added fix_after_pullout() to class Item_in_optimizer. sql/item_func.cc Necessary changes to fix_after_pullout(). sql/item_func.h Changed interface of fix_after_pullout(). sql/item_row.cc Necessary changes to fix_after_pullout(). sql/item_row.h Changed interface of fix_after_pullout(). sql/item_subselect.cc Item_subselect::fix_after_pullout() resolves all expressions referred from the inner query specification objects (select_lex objects). Item_in_subselect::fix_after_pullout() dispatches the query expression resolution to Item_subselect::fix_after_pullout(), but needs to resolve its left expression explicitly. sql/item_subselect.h Added fix_after_pullout() to classes Item_subselect and Item_in_subselect. sql/sql_select.cc Interface change to fix_list_after_tbl_changes(). Also make sure that transformed-away select_lex object is removed before calling fix_after_pullout(), so that the function operates on a chain of select_lex objects that is valid after semijoin transformation.
[13 Nov 2010 16:06]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[13 Nov 2010 16:37]
Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:jimmy.yang@oracle.com-20100804103744-vbpeghipkz6pyc9z) (pib:21)
[22 Nov 2010 1:24]
Paul DuBois
Bug is not in any released 5.6.x version. No changelog entry needed.
[2 Feb 2011 13:24]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (version source revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (merge vers: 5.6.2) (pib:24)