Bug #57431 subquery returns wrong results (semijoin=on) with predicate AND
Submitted: 13 Oct 2010 13:57 Modified: 3 Mar 2011 2:13
Reporter: SaiKumar V Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.6.99 OS:Linux (Red Hat)
Assigned to: Roy Lyseng CPU Architecture:Any
Tags: materialization, optimizer_switch, semijoin, subquery

[13 Oct 2010 13:57] SaiKumar V
Description:
The Query results in worng results (non-empty set), when semijoin is on.

SELECT `col_varchar_key`
FROM C
WHERE `col_int_nokey`  IN (
SELECT INNR .`col_int_nokey`
FROM BB  LEFT  JOIN BB INNR  ON INNR .`col_varchar_nokey`  )  AND `col_int_nokey`  =  2   ;
+-----------------+
| col_varchar_key |
+-----------------+
| t               |
| n               |
+-----------------+
2 rows in set (0.00 sec)

set session optimizer_switch='semijoin=off';

SELECT `col_varchar_key` FROM C WHERE `col_int_nokey`  IN ( SELECT INNR .`col_int_nokey` FROM BB  LEFT  JOIN BB INNR  ON INNR .`col_varchar_nokey`  )  AND `col_int_nokey`  =  2;

Empty set (0.00 sec)

How to repeat:
Description:
Bug found in:-
mysql-next-mr-opt-backporting
revno: 3259

RQG test failure, Optimizer settings used:
SET GLOBAL OPTIMIZER_SWITCH = 'semijoin=on';
SET GLOBAL optimizer_join_cache_level = 4 ;

-------------------------------------------------------------------------------
DROP TABLE /*! IF EXISTS */ C;
DROP TABLE /*! IF EXISTS */ BB;

CREATE TABLE `C` (
  `col_int_nokey` int(11) DEFAULT NULL,
  `col_int_key` int(11) DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  `col_varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (NULL,2,'w','w');
INSERT INTO `C` VALUES (7,9,'m','m');
INSERT INTO `C` VALUES (9,3,'m','m');
INSERT INTO `C` VALUES (7,9,'k','k');
INSERT INTO `C` VALUES (4,NULL,'r','r');
INSERT INTO `C` VALUES (2,9,'t','t');
INSERT INTO `C` VALUES (6,3,'j','j');
INSERT INTO `C` VALUES (8,8,'u','u');
INSERT INTO `C` VALUES (NULL,8,'h','h');
INSERT INTO `C` VALUES (5,53,'o','o');
INSERT INTO `C` VALUES (NULL,0,NULL,NULL);
INSERT INTO `C` VALUES (6,5,'k','k');
INSERT INTO `C` VALUES (188,166,'e','e');
INSERT INTO `C` VALUES (2,3,'n','n');
INSERT INTO `C` VALUES (1,0,'t','t');
INSERT INTO `C` VALUES (1,1,'c','c');
INSERT INTO `C` VALUES (0,9,'m','m');
INSERT INTO `C` VALUES (9,5,'y','y');
INSERT INTO `C` VALUES (NULL,6,'f','f');
INSERT INTO `C` VALUES (4,2,'d','d');
CREATE TABLE `BB` (
  `col_int_nokey` int(11) DEFAULT NULL,
  `col_int_key` int(11) DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  `col_varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (8,8,NULL,NULL);

SELECT `col_varchar_key`
FROM C
WHERE `col_int_nokey`  IN (
SELECT INNR .`col_int_nokey`
FROM BB  LEFT  JOIN BB INNR  ON INNR .`col_varchar_nokey`  )  AND `col_int_nokey`  =  2   ;
[13 Oct 2010 14:27] Roy Lyseng
Turning off join buffering does not make a difference.

The selected strategy is MaterializeScan.

Turning off materialization makes the optimizer use DuplicateWeedout strategy, and with this strategy the query returns correct results.

Running this query on the branch where I have some outer-join-related fixes does not make a difference.

Applying the fix for bug#45174 does not make a difference either.

Hence, it seems that this problem is related to the MaterializeScan strategy.
[14 Oct 2010 7:35] Roy Lyseng
May be a duplicate of bug#46555
[11 Nov 2010 13:47] SaiKumar V
Found one more testcase for this bug report, check with Roy Lyseng about this query.

CREATE TABLE `CC` (
  `col_int_nokey` int(11) DEFAULT NULL,
  `col_int_key` int(11) DEFAULT NULL,
  `col_time_key` time DEFAULT NULL,
  `col_time_nokey` time DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_time_key` (`col_time_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (7,8,'01:27:35','01:27:35','v');
INSERT INTO `CC` VALUES (1,9,'19:48:31','19:48:31','r');
INSERT INTO `CC` VALUES (5,9,'00:00:00','00:00:00','a');
INSERT INTO `CC` VALUES (3,186,'19:53:05','19:53:05','m');
INSERT INTO `CC` VALUES (6,NULL,'19:18:56','19:18:56','y');
INSERT INTO `CC` VALUES (92,2,'10:55:12','10:55:12','j');
INSERT INTO `CC` VALUES (7,3,'00:25:00','00:25:00','d');
INSERT INTO `CC` VALUES (NULL,0,'12:35:47','12:35:47','z');
INSERT INTO `CC` VALUES (3,133,'19:53:03','19:53:03','e');
INSERT INTO `CC` VALUES (5,1,'17:53:30','17:53:30','h');
INSERT INTO `CC` VALUES (1,8,'11:35:49','11:35:49','b');
INSERT INTO `CC` VALUES (2,5,NULL,NULL,'s');
INSERT INTO `CC` VALUES (NULL,5,'06:01:40','06:01:40','e');
INSERT INTO `CC` VALUES (1,8,'05:45:11','05:45:11','j');
INSERT INTO `CC` VALUES (0,6,'00:00:00','00:00:00','e');
INSERT INTO `CC` VALUES (210,51,'00:00:00','00:00:00','f');
INSERT INTO `CC` VALUES (8,4,'06:11:01','06:11:01','v');
INSERT INTO `CC` VALUES (7,7,'13:02:46','13:02:46','x');
INSERT INTO `CC` VALUES (5,6,'21:44:25','21:44:25','m');
INSERT INTO `CC` VALUES (NULL,4,'22:43:58','22:43:58','c');
CREATE TABLE `BB` (
  `col_int_nokey` int(11) DEFAULT NULL,
  `col_int_key` int(11) DEFAULT NULL,
  `col_time_key` time DEFAULT NULL,
  `col_time_nokey` time DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_time_key` (`col_time_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `BB` VALUES (8,8,'18:27:58','18:27:58',NULL);

select col_int_nokey from CC where col_int_nokey in (   SELECT PARENT1 .`col_int_nokey`   FROM BB PARENT1  LEFT  JOIN BB PARENT2  ON PARENT2 .`col_int_key`   WHERE PARENT1 .`col_int_key`  < PARENT1 .`col_int_nokey`  );
[24 Nov 2010 7:54] Roy Lyseng
The second test case is verified to trigger the same kind of error as the first one.
[24 Nov 2010 8:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124810

3287 Roy Lyseng	2010-11-24
      Bug#57431: subquery returns wrong result (semijoin=on) with pred AND
      
      The test case contains an outer query with a single table and an
      IN subquery with two outerjoined tables. The MaterializeScan
      semijoin strategy is selected, meaning that an outerjoin operation
      is first performed over join_tabs 0 and 1. After the outerjoin,
      sub_select_sjm() is called to perform a semijoin between the
      result of the outerjoin in join_tab 1 and the outer table in
      join_tab 2. However, join_tab 1 still contains some reminiscent data
      from the outerjoin operation, hence this operation also behaves like
      an outerjoin.
      
      Fixed by temporarily deleting the outerjoin information from the
      join_tab representing the result of the outerjoin operation.
      
      mysql-test/include/subquery_sj_innodb.inc
        Test case for Bug#57431. The problem can only be reproduced with
        1 row in the inner table, hence Innodb is needed to prevent
        'const table' optimization.
      
      mysql-test/r/subquery_sj_innodb_all.result mysql-test/r/subquery_sj_innodb_all_jcl6.result
      mysql-test/r/subquery_sj_innodb_all_jcl7.result
      mysql-test/r/subquery_sj_innodb_none.result
      mysql-test/r/subquery_sj_innodb_none_jcl6.result
      mysql-test/r/subquery_sj_innodb_none_jcl7.result
        Updated test results for Bug#57431.
      
      sql/sql_select.cc
        In sub_select_sjm(), clear the last_inner field of the first
        join_tab to take part in the semijoin operation.
[21 Jan 2011 16:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129354

3327 Roy Lyseng	2011-01-21
      Bug#57431: subquery returns wrong result (semijoin=on) with pred AND
      
      The test case contains an outer query with a single table and an
      IN subquery with two outerjoined tables. The MaterializeScan
      semijoin strategy is selected, meaning that an outerjoin operation
      is first performed over join_tabs 0 and 1. After the outerjoin,
      sub_select_sjm() is called to perform a semijoin between the
      result of the outerjoin in join_tab 1 and the outer table in
      join_tab 2. However, join_tab 1 still contains some reminiscent data
      from the outerjoin operation, hence this operation also behaves like
      an outerjoin.
      
      Fixed by temporarily deleting the outerjoin information from the
      join_tab representing the result of the outerjoin operation.
      
      mysql-test/include/subquery_sj_innodb.inc
        Test case for Bug#57431. The problem can only be reproduced with
        1 row in the inner table, hence Innodb is needed to prevent
        'const table' optimization.
      
      mysql-test/r/subquery_sj_innodb_all.result mysql-test/r/subquery_sj_innodb_all_jcl6.result
      mysql-test/r/subquery_sj_innodb_all_jcl7.result
      mysql-test/r/subquery_sj_innodb_none.result
      mysql-test/r/subquery_sj_innodb_none_jcl6.result
      mysql-test/r/subquery_sj_innodb_none_jcl7.result
        Updated test results for Bug#57431.
      
      sql/sql_select.cc
        In sub_select_sjm(), clear the last_inner field of the first
        join_tab to take part in the semijoin operation.
[24 Jan 2011 11:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129434

3327 Roy Lyseng	2011-01-24
      Bug#57431: subquery returns wrong result (semijoin=on) with pred AND
      
      The test case contains an outer query with a single table and an
      IN subquery with two outerjoined tables. The MaterializeScan
      semijoin strategy is selected, meaning that an outerjoin operation
      is first performed over join_tabs 0 and 1. After the outerjoin,
      sub_select_sjm() is called to perform a semijoin between the
      result of the outerjoin in join_tab 1 and the outer table in
      join_tab 2. However, join_tab 1 still contains some reminiscent data
      from the outerjoin operation, hence this operation also behaves like
      an outerjoin.
      
      Fixed by clearing the outerjoin information from the join_tab that
      represents the result of the outerjoin operation.
      
      mysql-test/include/subquery_sj_innodb.inc
        Test case for Bug#57431. The problem can only be reproduced with
        1 row in the inner table, hence Innodb is needed to prevent
        'const table' optimization.
      
      mysql-test/r/subquery_sj_innodb_all.result mysql-test/r/subquery_sj_innodb_all_jcl6.result
      mysql-test/r/subquery_sj_innodb_all_jcl7.result
      mysql-test/r/subquery_sj_innodb_none.result
      mysql-test/r/subquery_sj_innodb_none_jcl6.result
      mysql-test/r/subquery_sj_innodb_none_jcl7.result
        Updated test results for Bug#57431.
      
      sql/sql_select.cc
        In sub_select_sjm(), clear the last_inner field of the first
        join_tab to take part in the semijoin operation.
[2 Feb 2011 13:24] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (version source revid:jorgen.loland@oracle.com-20110202132358-khrjqzdcs3jrda3i) (merge vers: 5.6.2) (pib:24)
[3 Mar 2011 2:13] Paul DuBois
Noted in 5.6.2 changelog.

With semi-join optimization enabled, a subquery combined with an AND
predicate could produce incorrect results.

CHANGESET - http://lists.mysql.com/commits/129434
[8 Apr 2011 14:56] Paul DuBois
Correction. Bug does not affect 5.6.x users. No changelog entry needed.