Bug #59833 materialization=on/off leads to different result set when using IN
Submitted: 31 Jan 2011 8:06 Modified: 27 Feb 2011 2:43
Reporter: Roel Van de Paar Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.6.2-m5 OS:Any
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: materialization, optimizer_switch, subquery

[31 Jan 2011 8:06] Roel Van de Paar
Description:
mysql> SET SESSION optimizer_switch='materialization=on';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1 JOIN t2 USING ( `f1` ) WHERE t1.`f1` IN ( SELECT t1.`pk` FROM
 t1 ORDER BY t1.`f1` ) ;
+------+----+----+
| f1   | pk | pk |
+------+----+----+
|    0 | 10 | 10 |
|    0 | 10 | 11 |
+------+----+----+
2 rows in set (0.00 sec)

mysql> SET SESSION optimizer_switch='materialization=off';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t1 JOIN t2 USING ( `f1` ) WHERE t1.`f1` IN ( SELECT t1.`pk` FROM
 t1 ORDER BY t1.`f1` ) ;
Empty set (0.00 sec)

How to repeat:
DROP DATABASE test2;
CREATE DATABASE test2;
USE test2;

CREATE TABLE `t1` (
  `pk` int(11) NOT NULL,
  `f1` int(11) DEFAULT NULL,
  PRIMARY KEY (`pk`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

CREATE TABLE `t2` (
  `pk` int(11) NOT NULL,
  `f1` int(11) DEFAULT NULL,
  PRIMARY KEY (`pk`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO t1 VALUES (10,0);
INSERT INTO t2 VALUES (10,0),(11,0);

SET SESSION optimizer_switch='materialization=on';

SELECT * FROM t1 JOIN t2 USING ( `f1` ) WHERE t1.`f1` IN ( SELECT t1.`pk` FROM t1 ORDER BY t1.`f1` ) ;

SET SESSION optimizer_switch='materialization=off';

SELECT * FROM t1 JOIN t2 USING ( `f1` ) WHERE t1.`f1` IN ( SELECT t1.`pk` FROM t1 ORDER BY t1.`f1` ) ;
[31 Jan 2011 8:15] Roel Van de Paar
See bug #46548, bug #50019, bug #52329, bug #37115, bug #54281
[31 Jan 2011 9:03] Roel Van de Paar
Happens for MyISAM, MEMORY but not InnoDB.
[4 Feb 2011 0:35] Roel Van de Paar
Another testcase

DROP DATABASE IF EXISTS test2; CREATE DATABASE test2; USE test2; /* Test setup */

CREATE TABLE C (id int);
INSERT INTO C VALUES (5),(9),(9);

CREATE TABLE t2 SELECT VAR_POP(`id`) AS X FROM C;

SET @@SESSION.optimizer_switch="materialization=off";

SELECT * FROM t2 WHERE (X) IN (SELECT VAR_POP(`id`) AS X FROM C);  /* 3.5556 */

SET @@SESSION.optimizer_switch="materialization=on";

SELECT * FROM t2 WHERE (X) IN (SELECT VAR_POP(`id`) AS X FROM C);  /* No results */
[4 Feb 2011 0:41] Roel Van de Paar
Note that the second test case fails against InnoDB, MyISAM and Memory.

It looks like it would be a good idea to include both testcases (or similar) as MTR testcases.
[4 Feb 2011 10:38] Roy Lyseng
Roel, the second bug case is actually a different problem. Please log as another bug.
[4 Feb 2011 11:13] Roel Van de Paar
Separate bug logged as bug #59943
[4 Feb 2011 11:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130393

3338 Roy Lyseng	2011-02-04
      Bug#59833: materialization=on/off leads to different result set when using IN
      
      The offending query is subject to subquery materialization, and the
      subquery predicate is attached to the join object representing
      "t1 JOIN t2". Table t1 is subject to const table optimization,
      so it is not part of the join order. It also makes the subquery
      predicate become "const". Unlike a regular subquery, this
      materialized subquery is considered "expensive", ie is_expensive()
      returns true.
      
      Inside make_join_select(), we first consider const conditions, but
      since the subquery predicate is expensive, it is assumed to be
      handled in the execution phase. make_cond_for_table_from_pred() has
      a test to make sure that such const and expensive predicates are
      handled with the first table in the join order, in:
        !((used_table & 1) && cond->is_expensive())),
      but as can be seen by the comment just above the test, this test is
      already considered to be dubious. As the first table here is number 2
      (table number 1 is const), the test will never be true, and the
      subquery will thus never be evaluated.
      
      The approach taken to fix the problem relies on some refactoring.
      We want to add such expressions to the first table in the join order.
      We also know that argument 'used_table' of make_cond_for_table()
      represents the current table, and 'tables' represents the aggregation
      of handled tables. Thus, an easy fix is to check that
      tables=used_table. But it is not that easy. The list of tables also
      comprise the set of const tables (const_table_map), and the
      RAND_TABLE_BIT and OUTER_REF_TABLE_BIT bits. We add const_table_map
      and OUTER_REF_TABLE_BIT together with the first table (we know the
      values of the const tables and outer references are available),
      and add RAND_TABLE_BIT with the last table (so random expressions are
      evaluated for each resulting row), and we make sure that
      tables=used_table for the first table.
      
      mysql-test/include/subquery_mat.inc
        Added test case for bug#59833.
      
      mysql-test/r/innodb_icp.result
      mysql-test/r/innodb_icp_all.result
      mysql-test/r/innodb_icp_none.result
      mysql-test/r/subquery_all.result
      mysql-test/r/subquery_all_jcl6.result
      mysql-test/r/subquery_nomat_nosj.result
      mysql-test/r/subquery_nomat_nosj_jcl6.result
      mysql-test/r/subquery_none.result
      mysql-test/r/subquery_none_jcl6.result
      mysql-test/r/subquery_sj_all.result
      mysql-test/r/subquery_sj_all_jcl6.result
      mysql-test/r/subquery_sj_all_jcl7.result
      mysql-test/r/subquery_sj_dupsweed.result
      mysql-test/r/subquery_sj_dupsweed_jcl6.result
      mysql-test/r/subquery_sj_dupsweed_jcl7.result
      mysql-test/r/subquery_sj_firstmatch.result
      mysql-test/r/subquery_sj_firstmatch_jcl6.result
      mysql-test/r/subquery_sj_firstmatch_jcl7.result
      mysql-test/r/subquery_sj_innodb_none.result
      mysql-test/r/subquery_sj_innodb_none_jcl6.result
      mysql-test/r/subquery_sj_innodb_none_jcl7.result
      mysql-test/r/subquery_sj_loosescan.result
      mysql-test/r/subquery_sj_loosescan_jcl6.result
      mysql-test/r/subquery_sj_loosescan_jcl7.result
      mysql-test/r/subquery_sj_mat.result
      mysql-test/r/subquery_sj_mat_jcl6.result
      mysql-test/r/subquery_sj_mat_jcl7.result
      mysql-test/r/subquery_sj_mat_nosj.result
      mysql-test/r/subquery_sj_none.result
      mysql-test/r/subquery_sj_none_jcl6.result
      mysql-test/r/subquery_sj_none_jcl7.result
        Some plans are changed. They are always improved, as the predicates
        are now often evaluated on an earlier table in the join order.
        Some explanations are given below:
      
      mysql_test/r/subquery_all.result
        Inside a scalar subquery execution, the table conditions were evaluated
        both for the first and the second table in the join order. The reason
        is that the test !(cond->used_tables() & used_table) is false for
        both tables (the condition has used_tables=OUTER+t1, first table has
        used_table=t1, second table has used_table=OUTER+t2).
        With the new solution, the first table has used_table=OUTER+t1,
        so the condition will no longer be applied to the second table.
      
        Inside a derived table execution, SQL_SELECT::test_quick_select() decided
        upon the plan "Range checked for each record (index map: 0x2)" because
        the argument prev_tables was different because of this change.
      
      mysql-test/r/subquery_sj_mat_nosj.result
        The IN subquery predicate is moved from the last table in the outer
        select to the first table, which is good for performance.
      
      mysql-test/r/innodb_icp_none.result
        The predicate containing the inner subquery is moved from the
        second table to the first table of the outer subquery.
      
      mysql-test/r/subquery_mat.result
      mysql-test/r/subquery_mat_all.result
      mysql-test/r/subquery_mat_none.result
        Added test case results for bug#59833.
      
      sql/sql_select.cc
        In make_join_select(), make sure that 'used_tables' is always a
        strict aggregation of the previously handled 'current_map'.
        Also reflect better that outer references and const tables are
        available when processing the first table in the join order.
        In make_cond_for_table_from_pred(), change the test that attempts
        to add a const but expensive predicate to the first table in the
        join order.
        In pushdown_on_conditions(), update 'current_map' and 'used_tables'
        the way it's done in make_join_select().
[27 Feb 2011 2:43] Paul DuBois
Noted in 5.6.2 changelog.

For q query with an IN subquery, different results could be obtained
with subquery materialization enabled and disabled.

CHANGESET - http://lists.mysql.com/commits/131178
[8 Apr 2011 14:50] Paul DuBois
Correction. Bug does not affect 5.6.x users. No changelog entry needed.