Bug #45221 Query "SELECT pk FROM C WHERE pk IN (SELECT int_key)" failing
Submitted: 31 May 2009 19:22 Modified: 23 Nov 2010 2:52
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: 5.4, index_condition_pushdown, optimizer_switch, random query generator, subquery

[31 May 2009 19:22] Patrick Crews
Description:
The query SELECT `pk` FROM C WHERE `pk` IN (SELECT `int_key`) is producing different results when run against 5.1-bugteam and 5.4 via the Random Query generator

# 12:04:38 Query: SELECT  OUTR . `pk` AS X FROM C AS OUTR WHERE OUTR . `pk` IN ( SELECT  INNR . `int_key` AS Y FROM CC AS INNR WHERE INNR . `date_nokey` < INNR . `datetime_nokey` XOR OUTR . `date_nokey` BETWEEN '2004-07-10' AND '2009-11-25' ORDER BY INNR . `int_nokey` ) AND ( OUTR . `datetime_key` BETWEEN '2000-05-25' AND '2004-08-07' OR OUTR . `datetime_nokey` = '2007-10-24' )  ORDER BY OUTR . `int_key` , OUTR . `pk` failed: result length mismatch between servers (2 vs. 1)
--- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen1711-server0.dump        2009-05-31 12:04:38.000000000 -0400
+++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen1711-server1.dump        2009-05-31 12:04:38.000000000 -0400
@@ -1,2 +1 @@
 2
-4
# 12:04:40 Simplified query:  SELECT `pk`  FROM C  WHERE `pk`  IN (  SELECT `int_key`  )

How to repeat:
Run the Random Query Generator:
./runall.pl \
  --basedir1=<path>/mysql-5.1-bugteam \
  --basedir2=<path>/mysql-azalea \
  --grammar=conf/subquery_semijoin.yy \
  --threads=1 \
  --queries=1000 \
  --mysqld2=--init-file=<path>/mysql-test-extra-6.0/mysql-test/gentest/init/no_subquery.sql \
  --validator=ResultsetComparatorSimplify \
  --queries=100000

It should also be possible to create a simple grammar file that focuses solely on this query pattern.

Suggested fix:
Ensure 5.4 processes queries correctly.
[2 Jul 2009 21:45] Patrick Crews
.test file for the bug:

/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_key` int(11) NOT NULL,
  PRIMARY KEY (`pk`),
  KEY `int_key` (`int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;
INSERT INTO `C` VALUES (1,9),(2,0),(3,6),(4,6),(5,6),(6,4),(7,7),(8,4),(9,8),(10,4),(11,7),(12,5),(13,0),(14,5),(15,7),(16,2),(17,8),(18,0),(19,9),(20,5);

/*

OPTIMIZER SETTINGS:

*/

SET SESSION optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=off,semijoin=off';
SET SESSION optimizer_use_mrr = 'force';
SET SESSION engine_condition_pushdown = '1';
SET SESSION join_cache_level = '1';
SET GLOBAL optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=off,semijoin=off';
SET GLOBAL optimizer_use_mrr = 'force';
SET GLOBAL engine_condition_pushdown = '1';
SET GLOBAL join_cache_level = '1';

/*

ORIGINAL QUERY:

SELECT OUTR . `pk` AS X FROM C AS OUTR WHERE OUTR . `pk` IN ( SELECT INNR . `int_key` AS Y FROM CC AS INNR WHERE INNR . `date_nokey` < INNR . `datetime_nokey` XOR OUTR . `date_nokey` BETWEEN '2004-07-10' AND '2009-11-25' ORDER BY INNR . `int_nokey` ) AND ( OUTR . `datetime_key` BETWEEN '2000-05-25' AND '2004-08-07' OR OUTR . `datetime_nokey` = '2007-10-24' ) ORDER BY OUTR . `int_key` , OUTR . `pk`;

ORIGINAL DIFF:

--- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen67228-server0.dump	2009-07-02 17:41:56.000000000 -0400
+++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen67228-server1.dump	2009-07-02 17:41:56.000000000 -0400
@@ -1 +0,0 @@
-2

SIMPLIFIED QUERY:

 SELECT `pk`  FROM C  WHERE `pk`  IN (  SELECT `int_key`  )   ;

SIMPLIFIED DIFF:

*/
[20 Jul 2009 23:40] Patrick Crews
Clarification:
This bug is showing an error in 5.4, using 5.1 as the reference db for comparison.

The assumption is that 5.4 is returning incorrect results.
[8 Sep 2009 13:44] Patrick Crews
Revised MTR test case attached as separate file
Please comment / uncomment the appropriate Server0/Server1 lines here depending on which version of the server you are using.
[8 Sep 2009 13:44] Patrick Crews
MTR test case with original and simplified queries

Attachment: Bug45221_test.txt (text/plain), 8.01 KiB.

[8 Sep 2009 14:31] Guilhem Bichot
Using Patrick's test, comparing between
5.1.38 jperkin@sun.com-20090824091334-6ktgrhq218vl7zq1
next-bugfixing (6.0 code base) jon.hauglid@sun.com-20090826115923-d6516e54fmyxq3nn
I get those diffs:
@@ -28,10 +28,11 @@
 FROM CC  
 WHERE `date_nokey`  < `datetime_nokey`  XOR OUTR .`date_nokey`  )   ;
 pk
+9
 2
 5
 6
-9
+4
 DROP TABLE CC;
 DROP TABLE C;
 DROP TABLE /*! IF EXISTS */ CC;
@@ -71,6 +72,5 @@
 ORDER BY INNR . `int_nokey` ) AND ( OUTR . `datetime_key` BETWEEN '2000-05-25' AND '2004-08-07' OR OUTR . `datetime_nokey` = '2007-10-24' ) 
 ORDER BY OUTR . `int_key` , OUTR . `pk`;
 X
-9
 DROP TABLE CC;
 DROP TABLE C;

The second diff is like in Patrick's testcase, but the first is different.
The "4" in the first diff should not be there: for the row with pk=4, date_nokey is NULL, so when looking at row with pk=4, if I run:
SELECT `int_key`  
FROM CC  
WHERE `date_nokey`  < `datetime_nokey`  XOR NULL)
I get no rows which is correct. So, the IN() should be empty in
SELECT `pk`  
FROM C OUTR  
WHERE `pk`  IN (  
SELECT `int_key`  
FROM CC  
WHERE `date_nokey`  < `datetime_nokey`  XOR OUTR .`date_nokey`  )   ;
so "4" should not be returned.
So it's a regression compared to 5.1.
It goes away when setting
SET SESSION engine_condition_pushdown = Off ;
[8 Sep 2009 19:57] Patrick Crews
Sorry about the confusion, the test case I attached previously left optimizer_switch's materialization and semijoin values set to 'on'.

I have attached a new test case with these switches set to off (as things were in the original bug).  Using this test produces the same diff between 5.1 and azalea as originally reported.
[8 Sep 2009 19:58] Patrick Crews
Correct MTR test file

Attachment: bug45221_test.txt (text/plain), 8.00 KiB.

[8 Sep 2009 19:59] Patrick Crews
Additional test case caveat - 

If you are using 5.1, please do not uncomment the 'SET' lines for optimizer_use_mrr or join_cache_level - 5.1 doesn't have these and these are dummy rows produced by the RQG since these variables *were* used in the azalea server that produced this test.
[9 Sep 2009 8:38] Guilhem Bichot
Among Patrick's tests, the latest uses
XOR '2009-11-25'
whereas the previous use
XOR OUTR .`date_nokey`.
The latter seems to generate more diffs between 5.1 and 5.4, so I'm attaching a combined test which issues two SELECTs (one with each XOR).
Excluding warnings like "Warning	1292	Truncated incorrect INTEGER value: '2009-11-25'", the differences between result of 5.1 and mysql-next-bugfixing are:

@@ -27,10 +29,11 @@
 FROM CC  
 WHERE `date_nokey`  < `datetime_nokey`  XOR OUTR .`date_nokey`  )   ;
 pk
+9
 2
 5
 6
-9
+4
@@ -90,6 +100,5 @@
 ORDER BY INNR . `int_nokey` ) AND ( OUTR . `datetime_key` BETWEEN '2000-05-25' AND '2004-08-07' OR OUTR . `datetime_nokey` = '2007-10-24' ) 
 ORDER BY OUTR . `int_key` , OUTR . `pk`;
 X
-9
 DROP TABLE CC;

From there, if I do materialization=off, then the XOR diffs change; they also change (differently) if I do semijoin=off, and also (differently again) if I do
materialization=off,semijoin=off (I then find Patrick's latest results).
Results always get good with engine_condition_pushdown=off.
[9 Sep 2009 8:39] Guilhem Bichot
single test which can be run against 5.1 and next-bugfixing without commenting lines

Attachment: bug45221.test (application/octet-stream, text), 6.87 KiB.

[8 Oct 2009 13:15] Guilhem Bichot
I verified that it is related to ICP not engine condition pushdown.
[13 Nov 2009 9:59] Jørgen Løland
Simplified test case:

--------------
CREATE TABLE t1 (
  i1_key INT,
  i2 INT,
  i3 INT,
  KEY i2_key (i1_key)
) ENGINE=MyISAM;

INSERT INTO t1 VALUES 
  (9,10,11),
  (9,11,10);

CREATE TABLE t2 (
  pk INT NOT NULL,
  i1 INT,
  PRIMARY KEY (pk)
) ENGINE=MyISAM;

INSERT INTO t2 VALUES (9,1);

SELECT pk
FROM t2
WHERE 
   pk IN ( 
     SELECT i1_key
     FROM t1
     WHERE t1.i2 < t1.i3 XOR 
           t2.i1 > 1
     ORDER BY t1.i2 );

EXPLAIN
SELECT pk
FROM t2
WHERE 
   pk IN ( 
     SELECT i1_key
     FROM t1
     WHERE t1.i2 < t1.i3 XOR 
           t2.i1 > 1
     ORDER BY t1.i2 );
--------------

For engine_condition_pushdown='on':
Query result: empty
Explain:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
2	DEPENDENT SUBQUERY	t1	index_subquery	i2_key	i2_key	5	func	2	Using where

For engine_condition_pushdown='on':
Query result: 9
Explain:
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	t2	system	NULL	NULL	NULL	NULL	1	
2	DEPENDENT SUBQUERY	t1	index_subquery	i2_key	i2_key	5	func	2	Using where
[13 Nov 2009 10:00] Jørgen Løland
Typo in previous comment. The last block of results is for engine_condition_pushdown off.
[16 Nov 2009 15:10] Jørgen Løland
EXPLAIN extended
SELECT pk
FROM t2
WHERE 
pk IN ( 
SELECT i1_key
FROM t1
WHERE t1.i2 < t1.i3 XOR 
t2.i1 > 1
ORDER BY t1.i2 );
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
2	DEPENDENT SUBQUERY	t1	index_subquery	i1_index	i1_index	5	func	2	100.00	Using where
Warnings:
Note	1276	Field or reference 'test.t2.i1' of SELECT #2 was resolved in SELECT #1
Note	1003	select '9' AS `pk` from dual where <in_optimizer>('9',<exists>(<index_lookup>(<cache>('9') in t1 on i1_index where (((`test`.`t1`.`i2` < `test`.`t1`.`i3`) xor ('1' > 1)) and (<cache>('9') = `test`.`t1`.`i1_key`)))))

------

The XOR condition in the subselect is pushed to the index. However, the index is for column i1, while the XOR conditions are for columns i2 and i3.

This happens because in make_cond_for_index (sql_select.cc):
  - cond->type() for an XOR is FUNC_ITEM! [1]
  - uses_index_fields_only does not find that this XOR in fact 
    compares two columns that is not the index
  -> make_cond_for_index believes that it is safe to push this
     condition to the index.    

[1] item_cmpfunc.h around line 1730:
  /* TODO: remove the next line when implementing XOR optimization */
  enum Type type() const { return FUNC_ITEM; }
[24 Nov 2009 11:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91407

3728 Jorgen Loland	2009-11-24
      Bug#45221: Query "SELECT pk FROM C WHERE pk IN (SELECT int_key)"
                 failing
      
      XOR conditions are not optimized, and Item_cond_xor therefore 
      acts like type Func_item even though it inherits from Item_cond.
      A subtle difference between Item_func and Item_cond is that
      you can get the children Items from the former by calling 
      arguments(), and from the latter by calling argument_list(). 
      However, since Item_cond_xor inherits from Item_cond, 
      arguments() did not return any Items.
      
      The fact that Item_cond_xor::arguments() did not return it's 
      children items lead to a problem for make_cond_for_index(); 
      the method accepted that XOR items on unindexed columns were 
      pushed using ICP. ICP evaluation of non-indexed columns 
      does not (and should not) work.
      
      The fix for this bug is to make Item_cond_xor return it's 
      children items when the arguments() method is used. This makes
      Item_cond_xor behave more like Item_func and in turn allows 
      make_cond_for_index() to discover any conflicting children
      Items. 
      
      This is a temporary fix and should be removed when 
      Item_cond_xor is optimized.
     @ mysql-test/r/subselect4.result
        Added test for BUG#45221
     @ mysql-test/t/subselect4.test
        Added test for BUG#45221
     @ sql/item_cmpfunc.h
        Store children items of Item_cond_xor both in list (as is done by
        Item_cond) and args[] (as is done by Func_cond) as a temporary
        solution until Item_cond_xor is optimized.
[4 Dec 2009 7:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92830

3753 Jorgen Loland	2009-12-04
      Bug#45221: Query "SELECT pk FROM C WHERE pk IN (SELECT int_key)"
                 failing
            
      XOR conditions are not optimized, and Item_cond_xor therefore 
      acts like type Func_item even though it inherits from Item_cond.
      A subtle difference between Item_func and Item_cond is that
      you can get the children Items from the former by calling 
      arguments(), and from the latter by calling argument_list(). 
      However, since Item_cond_xor inherits from Item_cond, 
      arguments() did not return any Items.
      
      The fact that Item_cond_xor::arguments() did not return it's 
      children items lead to a problem for make_cond_for_index(); 
      the method accepted that XOR items on unindexed columns were 
      pushed using ICP. ICP evaluation of non-indexed columns 
      does not (and should not) work.
      
      The fix for this bug is to make Item_cond_xor return it's 
      children items when the arguments() method is used. This makes
      Item_cond_xor behave more like Item_func and in turn allows 
      make_cond_for_index() to discover any conflicting children
      Items. 
      
      This is a temporary fix and should be removed when 
      Item_cond_xor is optimized.
     @ mysql-test/r/subselect4.result
        Added test for BUG#45221
     @ mysql-test/t/subselect4.test
        Added test for BUG#45221
     @ sql/item_cmpfunc.h
        Store children items of Item_cond_xor both in list (as is done by
        Item_cond) and args[] (as is done by Func_cond) as a temporary
        solution until Item_cond_xor is optimized.
[4 Dec 2009 7:49] Jørgen Løland
Pushed to 6.0-codebase-bugfixing
[11 Dec 2009 6:01] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091211055901-yp18b3c7xuhl87rf) (version source revid:alik@sun.com-20091211055401-43rjwq7gjed6ds83) (merge vers: 6.0.14-alpha) (pib:13)
[5 Jan 2010 20:01] Paul DuBois
Noted in 6.0.14 changelog.

The XOR logical operator did not work correctly with Index Condition
Pushdown. XOR conditions could be incorrectly pushed to an index that
could not handle it, something that lead to missing records in the
query result set.
[7 May 2010 10:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/107718

3142 oystein.grovlen@sun.com	2010-05-07
      Bug#45221: Query "SELECT pk FROM C WHERE pk IN (SELECT int_key)"
                 failing
      
      (Backporting of jorgen.loland@sun.com-20091204074805-4ia06wjrzuue3ob0)
            
      XOR conditions are not optimized, and Item_cond_xor therefore 
      acts like type Func_item even though it inherits from Item_cond.
      A subtle difference between Item_func and Item_cond is that
      you can get the children Items from the former by calling 
      arguments(), and from the latter by calling argument_list(). 
      However, since Item_cond_xor inherits from Item_cond, 
      arguments() did not return any Items.
      
      The fact that Item_cond_xor::arguments() did not return it's 
      children items lead to a problem for make_cond_for_index(); 
      the method accepted that XOR items on unindexed columns were 
      pushed using ICP. ICP evaluation of non-indexed columns 
      does not (and should not) work.
      
      The fix for this bug is to make Item_cond_xor return it's 
      children items when the arguments() method is used. This makes
      Item_cond_xor behave more like Item_func and in turn allows 
      make_cond_for_index() to discover any conflicting children
      Items. 
      
      This is a temporary fix and should be removed when 
      Item_cond_xor is optimized.
     @ mysql-test/r/subselect4.result
        Added test for BUG#45221
     @ mysql-test/t/subselect4.test
        Added test for BUG#45221
     @ sql/item_cmpfunc.h
        Store children items of Item_cond_xor both in list (as is done by
        Item_cond) and args[] (as is done by Func_cond) as a temporary
        solution until Item_cond_xor is optimized.
[16 Aug 2010 6:41] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:09] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 2:52] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.