Bug #57421 subquery returns wrong results when semijoin=on
Submitted: 13 Oct 2010 6:16 Modified: 13 Oct 2010 11:14
Reporter: SaiKumar V Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:mysql-next-mr-opt-backporting OS:Linux (Red Hat)
Assigned to: CPU Architecture:Any
Tags: materialization, optimizer_switch, semijoin, subquery

[13 Oct 2010 6:16] SaiKumar V
Description:
Bug found in:-
mysql-next-mr-opt-backporting
revno: 3259

RQG test failure, Optimizer settings used:
SET GLOBAL OPTIMIZER_SWITCH = 'semijoin=on';
SET GLOBAL optimizer_join_cache_level = 4 ;

The main query returns rows that are not matched with the subquery which fails to return rows. And when semijoin is turned off the correct rows are returned.

SELECT `col_date_key`
FROM C
WHERE `col_varchar_nokey`  IN (
SELECT `col_varchar_key`
FROM CC
WHERE `col_varchar_nokey`  < `col_varchar_key`  )   ;

+--------------+
| col_date_key |
+--------------+
| 2001-09-19   |
| 2004-09-12   |
| 2002-07-19   |
| 2006-02-08   |
| 2001-04-14   |
| 2002-11-27   |
| 2005-05-03   |
| 2001-04-18   |
| 2005-12-27   |
| 2004-08-20   |
| 1900-01-01   |
+--------------+
11 rows in set (0.00 sec)

set session optimizer_switch='semijoin=off';                                                

SELECT `col_date_key` FROM C WHERE `col_varchar_nokey`  IN ( SELECT `col_varchar_key` FROM CC WHERE `col_varchar_nokey`  < `col_varchar_key`  );
Empty set (0.00 sec)

How to repeat:
SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,materialization=on,semijoin=on,loosescan=on,firstmatch=on,mrr=on,mrr_cost_based=off,index_condition_pushdown=on';

SET SESSION optimizer_join_cache_level = 4;

--disable_warnings
DROP TABLE /*! IF EXISTS */ CC;
DROP TABLE /*! IF EXISTS */ C;
--enable_warnings

CREATE TABLE `C` (
  `col_int_key` int(11) DEFAULT NULL,
  `col_date_key` date DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  `col_varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_date_key` (`col_date_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (2,NULL,'w','w');
INSERT INTO `C` VALUES (9,'2001-09-19','m','m');
INSERT INTO `C` VALUES (3,'2004-09-12','m','m');
INSERT INTO `C` VALUES (9,NULL,'k','k');
INSERT INTO `C` VALUES (NULL,'2002-07-19','r','r');
INSERT INTO `C` VALUES (9,'2002-12-16','t','t');
INSERT INTO `C` VALUES (3,'2006-02-08','j','j');
INSERT INTO `C` VALUES (8,'2006-08-28','u','u');
INSERT INTO `C` VALUES (8,'2001-04-14','h','h');
INSERT INTO `C` VALUES (53,'2000-01-05','o','o');
INSERT INTO `C` VALUES (0,'2003-12-06',NULL,NULL);
INSERT INTO `C` VALUES (5,'1900-01-01','k','k');
INSERT INTO `C` VALUES (166,'2002-11-27','e','e');
INSERT INTO `C` VALUES (3,NULL,'n','n');
INSERT INTO `C` VALUES (0,'2003-05-27','t','t');
INSERT INTO `C` VALUES (1,'2005-05-03','c','c');
INSERT INTO `C` VALUES (9,'2001-04-18','m','m');
INSERT INTO `C` VALUES (5,'2005-12-27','y','y');
INSERT INTO `C` VALUES (6,'2004-08-20','f','f');
INSERT INTO `C` VALUES (2,'1900-01-01','d','d');
CREATE TABLE `CC` (
  `col_int_key` int(11) DEFAULT NULL,
  `col_date_key` date DEFAULT NULL,
  `col_varchar_key` varchar(1) DEFAULT NULL,
  `col_varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `col_int_key` (`col_int_key`),
  KEY `col_date_key` (`col_date_key`),
  KEY `col_varchar_key` (`col_varchar_key`,`col_int_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (8,NULL,'v','v');
INSERT INTO `CC` VALUES (9,'2006-06-14','r','r');
INSERT INTO `CC` VALUES (9,'2002-09-12','a','a');
INSERT INTO `CC` VALUES (186,'2005-02-15','m','m');
INSERT INTO `CC` VALUES (NULL,NULL,'y','y');
INSERT INTO `CC` VALUES (2,'2008-11-04','j','j');
INSERT INTO `CC` VALUES (3,'2004-09-04','d','d');
INSERT INTO `CC` VALUES (0,'2006-06-05','z','z');
INSERT INTO `CC` VALUES (133,'1900-01-01','e','e');
INSERT INTO `CC` VALUES (1,'1900-01-01','h','h');
INSERT INTO `CC` VALUES (8,'1900-01-01','b','b');
INSERT INTO `CC` VALUES (5,'2005-01-13','s','s');
INSERT INTO `CC` VALUES (5,'2006-05-21','e','e');
INSERT INTO `CC` VALUES (8,'2003-09-08','j','j');
INSERT INTO `CC` VALUES (6,'2006-12-23','e','e');
INSERT INTO `CC` VALUES (51,'2006-10-15','f','f');
INSERT INTO `CC` VALUES (4,'2005-04-06','v','v');
INSERT INTO `CC` VALUES (7,'2008-04-07','x','x');
INSERT INTO `CC` VALUES (6,'2006-10-10','m','m');
INSERT INTO `CC` VALUES (4,'1900-01-01','c','c');

SELECT `col_date_key`
FROM C
WHERE `col_varchar_nokey`  IN (
SELECT `col_varchar_key`
FROM CC
WHERE `col_varchar_nokey`  < `col_varchar_key`  )   ;
[13 Oct 2010 7:27] Øystein Grøvlen
This is an issue with semijoin materilization.  Turning off materialization makes the query give correct result.  (Changing join_cache_level to default value, did not change anything.)

Query plan for failing query:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: C
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: CC
         type: ALL
possible_keys: col_varchar_key
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20
        Extra: Materialize
[13 Oct 2010 11:14] Øystein Grøvlen
This is a duplicate of Bug#45174.  (Verified by observing that the patch for Bug#45174 also fixes this issue.)