Bug #57525 Semijoin transformed subquery with inner grouped subquery gives empty result
Submitted: 18 Oct 2010 13:35 Modified: 3 Mar 2011 1:25
Reporter: Roy Lyseng Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:next-mr-opt-backporting OS:Any
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: optimizer_switch, semijoin, subquery

[18 Oct 2010 13:35] Roy Lyseng
Description:
We have a query with two nested subqueries, and the inner subquery is grouped and is correlated to the outer-most query in its HAVING clause. The outer-most subquery is transformed to a semijoin operation.

When semijoin is disabled, the query returns correct results (2, 2, 3, 2).

When semijoin is enabled, the result is empty, regardless of which strategy that is selected (Materialization or DuplicateWeedout).

Notice that this seems to happen regardless of whether the patch for bug#31480 is applied or not.

How to repeat:
CREATE TABLE t1 (a INT NOT NULL, b INT NOT NULL);
CREATE TABLE t2 (c INT NOT NULL, d INT NOT NULL);
CREATE TABLE t3 (e INT NOT NULL);
CREATE TABLE t4 (f INT NOT NULL, g INT NOT NULL);

INSERT INTO t1 VALUES (1,10);
INSERT INTO t1 VALUES (2,10);
INSERT INTO t1 VALUES (1,20);
INSERT INTO t1 VALUES (2,20);
INSERT INTO t1 VALUES (3,20);
INSERT INTO t1 VALUES (2,30);
INSERT INTO t1 VALUES (4,40);
INSERT INTO t2 VALUES (2,10);
INSERT INTO t2 VALUES (2,20);
INSERT INTO t2 VALUES (4,10);
INSERT INTO t2 VALUES (5,10);
INSERT INTO t2 VALUES (3,20);
INSERT INTO t2 VALUES (2,40);
INSERT INTO t3 VALUES (10);
INSERT INTO t3 VALUES (30);
INSERT INTO t3 VALUES (10);
INSERT INTO t3 VALUES (20);
INSERT INTO t4 VALUES (2,10);
INSERT INTO t4 VALUES (2,10);
INSERT INTO t4 VALUES (3,10);
INSERT INTO t4 VALUES (4,10);
INSERT INTO t4 VALUES (4,20);
INSERT INTO t4 VALUES (4,20);

let query=
SELECT * FROM t1 AS ta
WHERE ta.a IN (SELECT c FROM t2 AS tb
               WHERE tb.d >= SOME(SELECT SUM(g) FROM t4 as tc
                                  GROUP BY f
                                  HAVING ta.a=tc.f));

set @@optimizer_switch='materialization=off,semijoin=off';

eval $query;

set @@optimizer_switch='materialization=off,semijoin=on';

eval $query;

set @@optimizer_switch='materialization=on,semijoin=on';

eval $query;
[18 Nov 2010 9:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124235

3287 Roy Lyseng	2010-11-18
      Bug#57525: Semijoin transformed subquery with inner grouped subquery
                 gives empty result.
      
      This is a followup to bug#31480, which attempted to fix resolved
      information in subqueries as part of semijoin transformation.
      
      The specific problem here is that the inner subquery (which is not
      transformed) contains an outer reference to the outer-most block in
      its HAVING clause. The HAVING clause is represented by Item_ref
      objects, and these are not re-resolved properly through
      fix_after_pullout().
      The solution lies in realizing that Item_ref objects contain a
      complete set of resolution data structures (ie depended_from,
      name resolution context), and that it may point to an Item object
      with it's own set of resolution data structures.
      Hence, we implement fix_after_pullout() for Item_ref by first
      calling fix_after_pullout() on the referenced object and then for
      itself.
      
      The above fix revealed another problem: The offending query failed
      in prepared statement mode. The reason is that the depended_from
      field in Item_ref is reset in cleanup() and not restored at the
      next fix_fields() call. A hack that persists the depended_from
      field across muliple fix_fields() solves this problem, and I think
      it is safe in the general case.
      
      There is also another followup to bug#31480:
      
      A - t1
       \
        B - t2
         \
          C - t3
           \
            D - t4
              where t1.x=t4.y
      
      The above figure describes an outer query expression (select_lex A)
      with 3 nested subqueries represented by select_lex B, C and D.
      The innermost subquery (D) contains a reference to a table t1
      in the outermost block (A).
      The original resolver marked the subquery containing query expression  
      B as using table t1, and the subqueries containing query expressions
      C and D with OUTER_REF_TABLE_BIT.
      The code in bug#31480 failed to preserve all outer references
      when calling fix_after_pullout().
      
      mysql-test/r/optimizer_switch.result
        Updated with correct result and plan for query that exposed bug.
      
      mysql-test/t/optimizer_switch.test
        Warnings about wrong results removed.
      
      sql/item.cc
        A common implementation of fix_after_pullout() for all Item_ref
        classes has been made. The exception is Item_aggregate_ref which
        must have its own implementation and Item_outer_ref which we still
        have no test case for.
        Item_ref::fix_after_pullout() now calls fix_after_pullout() on the
        referenced item, and then it calls Item_ident::fix_after_pullout()
        on itself.
        Item_field::fix_after_pullout() is also moved to class Item_ident,
        which is parent class for both Item_field and Item_ref.
        A new function Item_ref::resolved_used_tables() was needed.
      
      sql/item.h
        Some adjustments to function prototypes, see sql/item.cc.
      
      sql/item_subselect.h
        One changed friend declaration.
[24 Jan 2011 14:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129446

3328 Roy Lyseng	2011-01-24
      Bug#57525: Semijoin transformed subquery with inner grouped subquery
                 gives empty result.
      
      This is a followup to bug#31480, which attempted to fix resolved
      information in subqueries as part of semijoin transformation.
      
      The specific problem here is that the inner subquery (which is not
      transformed) contains an outer reference to the outer-most block in
      its HAVING clause. The HAVING clause is represented by Item_ref
      objects, and these are not re-resolved properly through
      fix_after_pullout().
      The solution lies in realizing that Item_ref objects contain a
      complete set of resolution data structures (ie depended_from,
      name resolution context), and that it may point to an Item object
      with it's own set of resolution data structures.
      Hence, we implement fix_after_pullout() for Item_ref by first
      calling fix_after_pullout() on the referenced object and then for
      itself.
      
      The above fix revealed another problem: The offending query failed
      in prepared statement mode. The reason is that the depended_from
      field in Item_ref is reset in cleanup() and not restored at the
      next fix_fields() call. The problem is fixed by not resetting the
      depended_from flag. This is an acceptable solution, as long as
      transformations are run only on the first execution of a query.
      
      There is also another followup to bug#31480:
      
      A - t1
       \
        B - t2
         \
          C - t3
           \
            D - t4
              where t1.x=t4.y
      
      The above figure describes an outer query expression (select_lex A)
      with 3 nested subqueries represented by select_lex B, C and D.
      The innermost subquery (D) contains a reference to a table t1
      in the outermost block (A).
      The original resolver marked the subquery containing query expression  
      B as using table t1, and the subqueries containing query expressions
      C and D with OUTER_REF_TABLE_BIT.
      The code in bug#31480 failed to preserve all outer references
      when calling fix_after_pullout().
      
      mysql-test/r/optimizer_switch.result
        Updated with correct result and plan for query that exposed bug.
      
      mysql-test/t/optimizer_switch.test
        Warnings about wrong results removed.
      
      sql/item.cc
        A common implementation of fix_after_pullout() for all Item_ref
        classes has been made. The exception is Item_outer_ref which we
        still have no test case for.
        Item_ref::fix_after_pullout() now calls fix_after_pullout() on the
        referenced item, and then it calls Item_ident::fix_after_pullout()
        on itself.
        Item_field::fix_after_pullout() is also moved to class Item_ident,
        which is parent class for both Item_field and Item_ref.
        A new function Item_ref::resolved_used_tables() was needed.
      
      sql/item.h
        Some adjustments to function prototypes, see sql/item.cc.
      
      sql/item_subselect.h
        One changed friend declaration.
[2 Feb 2011 13:24] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (version source revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (merge vers: 5.6.2) (pib:24)
[3 Mar 2011 1:25] Paul DuBois
Noted in 5.6.2 changelog.

A semi-join transformed subquery with an inner grouped subquery
produced an empty result. 

CHANGESET - http://lists.mysql.com/commits/129446
[8 Apr 2011 14:54] Paul DuBois
Correction. Bug does not affect 5.6.x users. No changelog entry needed.