Bug #31480 Incorrect result for nested subquery when executed via semi join
Submitted: 9 Oct 2007 13:29 Modified: 2 Mar 2011 3:28
Reporter: Timour Katchaounov Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:6.0-{main, opt} OS:Any
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: Contribution, optimizer_switch, semijoin, subquery

[9 Oct 2007 13:29] Timour Katchaounov
Description:
When the query below is executed via the semi join strategy, it
returns an empty result, while it should return:
{2,2,3,2} (in any order).

How to repeat:
drop table if exists t1, t2, t3;
create table t1 (a int not null, b int not null);
create table t2 (c int not null, d int not null);
create table t3 (e int not null);

insert into t1 values (1,10);
insert into t1 values (2,10);
insert into t1 values (1,20);
insert into t1 values (2,20);
insert into t1 values (3,20);
insert into t1 values (2,30);
insert into t1 values (4,40);

insert into t2 values (2,10);
insert into t2 values (2,20);
insert into t2 values (4,10);
insert into t2 values (5,10);
insert into t2 values (3,20);
insert into t2 values (2,40);

insert into t3 values (10);
insert into t3 values (30);
insert into t3 values (10);
insert into t3 values (20);

set @@optimizer_switch=no_materialization;
explain extended
select a from t1
where a in (select c from t2 where d >= some(select e from t3 where b=e));
show warnings;

select a from t1
where a in (select c from t2 where d >= some(select e from t3 where b=e));
[16 Nov 2007 17:19] Sergey Petrunya
Investigation results
=====================
The cause of the problem is that the inner (non-flattened) subquery has incorrect table_map().
[6 Jul 2009 9:45] Sergey Petrunya
The patch is available here:
https://code.launchpad.net/~maria-captains/maria/mysql-6.0-testing2
[10 Sep 2009 7:58] Sergei Golubchik
http://lists.mysql.com/internals/37295
[17 Mar 2010 9:19] Jørgen Løland
SergeyP has submitted the patch under SCA: 
http://lists.mysql.com/internals/37804

Thank you Sergey.
[10 Jun 2010 8:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110690

3189 Roy Lyseng	2010-06-10
      Bug#31480: Incorrect result for nested subquery when executed via semijoin
      
      This problem may occur when we have a query containing at least two nested
      subqueries; one subquery is transformed into a semijoin, and a subquery that
      is inner to the transformed subquery (and is not itself transformed away)
      contains outer references either to the transformed subquery or to it's
      immediate outer query block.
      
      When this situation happens, the transformed query and it's immediate
      outer query block is consolidated into one query block. In this process,
      involved tables may be renumbered, and resolved information for the inner
      expressions may change.
      
      Example with a query:
      
      select t1.a from t1
      where t1.a in (select t2.c from t2
                     where t2.d >= some(select t3.e from t3
                                        where t1.b=t3.e));
      
      Graph of select_lex objects and table objects for this query, before transformation:
      
       A - t1
        \
         B - t2
          \
           C - t3
      
      For shorthand, we denote the IN predicate the "outer subquery predicate"
      and the quantified comparison predicate the "inner subquery predicate".
      The outer subquery predicate contains select_lex B and the inner subquery
      predicate contains select_lex C.
      
      Now, semijoin transformation is applied to the outer subquery predicate. 
      Here is the query graph after transformation:
      
       A - t1, t2
        \
         C - t3
      
      After the transformation, used_tables information is wrong for the column t1.b.
      
      The problem is fixed by adding fix_after_pullout() functions for the
      subquery objects (Item_subselect, Item_in_subselect, Item_in_optimizer)
      that can propagate resolution of expressions into subqueries that have
      expressions that are affected by the semijoin transformation.
      
      In addition, fix_after_pullout() is extended with another argument that
      identifies the select_lex object that is being removed. With the current use
      of fix_after_pullout(), this information is implicit, but it has to be explicit
      when also handling fields that are referenced within inner subqueries.
      
      The new fix_after_pullout() interface then becomes:
      
      fix_after_pullout(st_select_lex *parent_select, st_select_lex *removed_select,
                        Item **ref);
      
      fix_after_pullout() applied to an Item_field object only needs to handle cases
      where the depended_from or context->select_lex is equal to either parent_select
      or removed_select. If the field belongs to a table that is "inner" or "outer"
      compared to the select_lex level we are transforming, then no information needs
      to be updated.
      
      Note also that information about correlated outer columns must be recorded
      directly in a subquery predicate object. Suppose we have two select_lex objects
      A and B. B is part of a subquery that is evaluated on level A. The subquery
      has a predicate that contains a reference to a column that is resolved to a
      table t1 that belongs to select_lex A. In order to properly evaluate the
      subquery predicate, we need to make sure that a row from t1 is present.
      
      Thus, the fix_after_pullout() function for Item_field needs to propagate
      information about the table it is resolved within to an outer subquery
      predicate, if that predicate is evaluated on the same level as the table t1
      is resolved on.
      
      In addition, all subquery predicate objects found between the select_lex
      where the field reference is found and the select_lex where the field reference
      is resolved, except the upper predicate mentioned above,
      must be marked as outer correlated by setting OUTER_REF_TABLE_BIT.
      
      mysql-test/r/subselect_sj.result
        Added result for test case for Bug#31480
      
      mysql-test/r/subselect_sj_jcl6.result
        Added result for test case for Bug#31480
      
      mysql-test/t/subselect_sj.test
        Added test case for Bug#31480
      
      sql/item.cc
        Item_field::resolved_used_tables() is implemented.
        Item_field::fix_after_pullout() is changed so that it behaves properly
        also for fields that are used in a scope that is inner to the select_lex
        being removed. Also update used_tables information for subqueries that
        are outer-correlated with respect to this field.
        Updated fix_after_pullout() in conformance with new interface.
      
      sql/item.h
        Changed interface of fix_after_pullout().
        Added interface for resolved_used_tables(). This function returns
        used table information for the level on which a field's table is resolved,
        in contrast to used_tables() which returns OUTER_REF_TABLE_BIT if
        the field is an outer reference.
      
      sql/item_cmpfunc.cc
        Necessary changes to fix_after_pullout().
        Item_in_optimizer::fix_after_pullout() resolves itself by dispatching
        to the wrapped Item_in_subselect object.
      
      sql/item_cmpfunc.h
        Added fix_after_pullout() to class Item_in_optimizer.
      
      sql/item_func.cc
        Necessary changes to fix_after_pullout().
      
      sql/item_func.h
        Changed interface of fix_after_pullout().
      
      sql/item_row.cc
        Necessary changes to fix_after_pullout().
      
      sql/item_row.h
        Changed interface of fix_after_pullout().
      
      sql/item_subselect.cc
        Item_subselect::fix_after_pullout() resolves all expressions referred
        from the inner query specification objects (select_lex objects).
        Item_in_subselect::fix_after_pullout() dispatches the query expression
        resolution to Item_subselect::fix_after_pullout(), but needs to resolve
        its left expression explicitly.
      
      sql/item_subselect.h
        Added fix_after_pullout() to classes Item_subselect and Item_in_subselect.
      
      sql/sql_select.cc
        Interface change to fix_list_after_tbl_changes().
        Also make sure that transformed-away select_lex object is removed before
        calling fix_after_pullout().
[14 Oct 2010 8:19] Roy Lyseng
Setting "approved" on behalf of Evgeny because of this mail:

Hello Roy,

Evgeny is asking me to tell you this information: you can push bug #31480 with Jorgen's changes.
(Evgeny is without internet connection ATM).

Thank you,
Gleb.
[15 Oct 2010 10:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120822

3263 Roy Lyseng	2010-10-15
      Bug#31480: Incorrect result for nested subquery when executed via semijoin
      
      This problem may occur when we have a query containing at least two nested
      subqueries; one subquery is transformed into a semijoin, and a subquery that
      is inner to the transformed subquery (and is not itself transformed away)
      contains outer references either to the transformed subquery or to it's
      immediate outer query block.
      
      When this situation happens, the transformed query and it's immediate
      outer query block is consolidated into one query block. In this process,
      involved tables may be renumbered, and resolved information for the inner
      expressions may change.
      
      Example with a query:
      
      select t1.a from t1
      where t1.a in (select t2.c from t2
                     where t2.d >= some(select t3.e from t3
                                        where t1.b=t3.e));
      
      Graph of select_lex objects and table objects for this query, before transformation:
      
       A - t1
        \
         B - t2
          \
           C - t3
      
      For shorthand, we denote the IN predicate the "outer subquery predicate"
      and the quantified comparison predicate the "inner subquery predicate".
      The outer subquery predicate contains select_lex B and the inner subquery
      predicate contains select_lex C.
      
      Now, semijoin transformation is applied to the outer subquery predicate. 
      Here is the query graph after transformation:
      
       A - t1, t2
        \
         C - t3
      
      After the transformation, used_tables information is wrong for the column t1.b.
      
      The problem is fixed by adding fix_after_pullout() functions for the
      subquery objects (Item_subselect, Item_in_subselect, Item_in_optimizer),
      that can propagate re-resolution down to subqueries that have expressions
      that are affected by the semijoin transformation.
      
      In addition, fix_after_pullout() is extended with another argument that
      identifies the select_lex object that is being removed. With the current use
      of fix_after_pullout(), this information is implicit, but it has to be explicit
      when also handling fields that are referenced within inner subqueries.
      
      The new fix_after_pullout() interface then becomes:
      
      fix_after_pullout(st_select_lex *parent_select, st_select_lex *removed_select,
                        Item **ref);
      
      fix_after_pullout() applied to an Item_field object only needs to handle cases
      where the depended_from or context->select_lex is equal to either parent_select
      or removed_select. If the field belongs to a table that is "inner" or "outer"
      compared to the select_lex level we are transforming, then no information needs
      to be updated.
      
      Note also that information about correlated outer columns must be recorded
      directly in a subquery predicate object. Suppose we have two select_lex objects
      A and B. B is part of a subquery that is evaluated on level A. The subquery
      has a predicate that contains a reference to a column that is resolved to a
      table t1 that belongs to select_lex A. In order to properly evaluate the
      subquery predicate, we need to make sure that a row from t1 is present.
      
      Thus, the fix_after_pullout() function for Item_field needs to propagate
      information about the table it is resolved within to an outer subquery
      predicate, if that predicate is evaluated on the same level as the table t1
      is resolved on.
      
      mysql-test/r/optimizer_switch.result
        Added result for test case for Bug#31480
      
      mysql-test/r/subquery_sj_all.result
      mysql-test/r/subquery_sj_all_jcl6.result
      mysql-test/r/subquery_sj_all_jcl7.result
      mysql-test/r/subquery_sj_dupsweed.result
      mysql-test/r/subquery_sj_dupsweed_jcl6.result
      mysql-test/r/subquery_sj_dupsweed_jcl7.result
      mysql-test/r/subquery_sj_firstmatch.result
      mysql-test/r/subquery_sj_firstmatch_jcl6.result
      mysql-test/r/subquery_sj_firstmatch_jcl7.result
      mysql-test/r/subquery_sj_loosescan.result
      mysql-test/r/subquery_sj_loosescan_jcl6.result
      mysql-test/r/subquery_sj_loosescan_jcl7.result
      mysql-test/r/subquery_sj_mat.result
      mysql-test/r/subquery_sj_mat_jcl6.result
      mysql-test/r/subquery_sj_mat_jcl7.result
        Minor plan change (WHERE clause moved from first to second table).
        The reason for this is that the IN subquery predicate object is updated
        with correct used tables information.
      
      mysql-test/t/optimizer_switch.test
        Added test case for Bug#31480
        Notice that the test case with an inner grouped subquery is wrong
        when semijoin optimization is turned on.
      
      sql/item.cc
        Item_field::resolved_used_tables() is implemented.
        Item_field::fix_after_pullout() is changed so that it behaves properly
        also for fields that are used in a scope that is inner to the select_lex
        being removed. Also update used_tables information for subqueries that
        are outer-correlated with respect to this field.
        Updated fix_after_pullout() in conformance with new interface.
      
      sql/item.h
        Changed interface of fix_after_pullout().
        Added interface for resolved_used_tables(). This function returns
        used table information for the level on which a field's table is resolved,
        in contrast to used_tables() which returns OUTER_REF_TABLE_BIT if
        the field is an outer reference.
      
      sql/item_cmpfunc.cc
        Necessary changes to fix_after_pullout().
        Item_in_optimizer::fix_after_pullout() resolves itself by dispatching
        to the wrapped Item_in_subselect object.
      
      sql/item_cmpfunc.h
        Added fix_after_pullout() to class Item_in_optimizer.
      
      sql/item_func.cc
        Necessary changes to fix_after_pullout().
      
      sql/item_func.h
        Changed interface of fix_after_pullout().
      
      sql/item_row.cc
        Necessary changes to fix_after_pullout().
      
      sql/item_row.h
        Changed interface of fix_after_pullout().
      
      sql/item_subselect.cc
        Item_subselect::fix_after_pullout() resolves all expressions referred
        from the inner query specification objects (select_lex objects).
        Item_in_subselect::fix_after_pullout() dispatches the query expression
        resolution to Item_subselect::fix_after_pullout(), but needs to resolve
        its left expression explicitly.
      
      sql/item_subselect.h
        Added fix_after_pullout() to classes Item_subselect and Item_in_subselect.
      
      sql/sql_select.cc
        Interface change to fix_list_after_tbl_changes().
        Also make sure that transformed-away select_lex object is removed before
        calling fix_after_pullout(), so that the function operates on a chain
        of select_lex objects that is valid after semijoin transformation.
[13 Nov 2010 16:06] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[13 Nov 2010 16:37] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:jimmy.yang@oracle.com-20100804103744-vbpeghipkz6pyc9z) (pib:21)
[22 Nov 2010 1:24] Paul DuBois
Bug is not in any released 5.6.x version. No changelog entry needed.
[2 Feb 2011 13:24] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (version source revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (merge vers: 5.6.2) (pib:24)