Bug #46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on
Submitted: 4 Aug 2009 20:59 Modified: 23 Nov 2010 2:55
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Øystein Grøvlen CPU Architecture:Any
Tags: firstmatch, optimizer_switch, semijoin, subquery

[4 Aug 2009 20:59] Patrick Crews
Description:
Certain queries using an IN subquery are returning incorrect, empty result sets in azalea when semijoin=on

From the attached test case:
SELECT `varchar_key`  
FROM C  
WHERE `pk`  IN (  
SELECT subquery_t1 .`pk`  
FROM C subquery_t1  JOIN C subquery_t2  ON subquery_t2 .`varchar_key`  = subquery_t1 .`varchar_nokey`  )   ;

Returns no rows (incorrectly), but properly returns all varchar_key values when semijoin=off

This is interesting as it is the opposite of (no rows vs. extra rows) yet both are tied to the semijoin switch
Bug#46550	Azalea returning duplicate results for some IN subqueries w/ semijoin=on

How to repeat:
Uncomment the appropriate Server0/Server1 lines depending on which version of the server is being used.  

Toggle the semijoin switch from on to off and observe the change in result sets.

# Server0: version = 5.1.38-debug-log
# Server1: version = 5.4.4-alpha-debug-log

# The value of optimizer_switch is distinct between the two servers:
# Server 0 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on';
# Server 1 : SET SESSION optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=on,semijoin=on';

# The value of optimizer_use_mrr is distinct between the two servers:
# Server 0 : SET SESSION optimizer_use_mrr = '';
# Server 1 : SET SESSION optimizer_use_mrr = 'force';

# The value of engine_condition_pushdown is common between the two servers:
/*!50400 SET SESSION engine_condition_pushdown = ON */;

# The value of join_cache_level is distinct between the two servers:
# Server 0 : SET SESSION join_cache_level = '';
# Server 1 : SET SESSION join_cache_level = 1;

# Begin test case for query 0

--disable_warnings
DROP TABLE IF EXISTS C;
--enable_warnings

CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `varchar_key` (`varchar_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (1,'g','g'),(2,'v','v'),(3,'t','t'),(4,'u','u'),(5,'n','n'),(6,'p','p'),(7,NULL,NULL),(8,'u','u'),(9,'n','n'),(10,NULL,NULL),(11,'k','k'),(12,'e','e'),(13,'d','d'),(14,'t','t'),(15,'o','o'),(16,'e','e'),(17,'s','s'),(18,NULL,NULL),(19,'d','d'),(20,'z','z');

 
SELECT `varchar_key`  
FROM C  
WHERE `pk`  IN (  
SELECT subquery_t1 .`pk`  
FROM C subquery_t1  JOIN C subquery_t2  ON subquery_t2 .`varchar_key`  = subquery_t1 .`varchar_nokey`  )   ;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen73904-1249407669-server0.dump	2009-08-04 13:41:09.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen73904-1249407669-server1.dump	2009-08-04 13:41:09.000000000 -0400
# @@ -1,17 +0,0 @@
# -d
# -d
# -e
# -e
# -g
# -k
# -n
# -n
# -o
# -p
# -s
# -t
# -t
# -u
# -u
# -v
# -z

DROP TABLE C;

# End of test case for query 0

# Begin test case for query 1

--disable_warnings
DROP TABLE IF EXISTS C;
--enable_warnings

CREATE TABLE `C` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `int_nokey` int(11) DEFAULT NULL,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;
INSERT INTO `C` VALUES (1,NULL,1,'g','g'),(2,0,1,'v','v'),(3,199,247,'t','t'),(4,3,7,'u','u'),(5,7,5,'n','n'),(6,3,4,'p','p'),(7,8,1,NULL,NULL),(8,9,5,'u','u'),(9,9,4,'n','n'),(10,0,9,NULL,NULL),(11,0,1,'k','k'),(12,188,166,'e','e'),(13,2,3,'d','d'),(14,5,6,'t','t'),(15,5,5,'o','o'),(16,6,9,'e','e'),(17,7,5,'s','s'),(18,2,6,NULL,NULL),(19,2,4,'d','d'),(20,6,NULL,'z','z');

 
SELECT  MIN( `int_key` ) AS field1, MAX( `int_key` ) AS field2, `varchar_key` AS field3 
FROM C AS table1 
WHERE ( ( table1 . `varchar_key` = table1 . `varchar_key` OR table1 . `varchar_key` = table1 . `varchar_key` ) AND ( table1 . `pk`  IN ( 
SELECT subquery_t1 . `pk` 
FROM C AS subquery_t1 INNER JOIN C AS subquery_t2 ON ( subquery_t2 . `varchar_key` = subquery_t1 . `varchar_nokey` ) 
WHERE subquery_t1 . `pk` >= 'd' )) AND table1 . `int_nokey` = 0 )  
GROUP BY field3;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen73904-1249407670-server0.dump	2009-08-04 13:41:10.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen73904-1249407670-server1.dump	2009-08-04 13:41:10.000000000 -0400
# @@ -1,2 +0,0 @@
# -1	1	k
# -1	1	v

DROP TABLE C;

# End of test case for query 1

Suggested fix:
Ensure correct query processing regardless of optimizer settings.
[12 Nov 2009 12:44] Tor Didriksen
The query below, originally from
http://bugs.mysql.com/bug.php?id=46550
Azalea returning duplicate results for some IN subqueries w/ semijoin=on

may be a similar problem?

# Begin test case for query 1

--disable_warnings
DROP TABLE IF EXISTS CC, B;
--enable_warnings

CREATE TABLE `CC` (
  `int_nokey` int(11) DEFAULT NULL,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES
(3,1,'m','m'),(196,40,'h','h'),(4,1,'r','r'),(0,1,'h','h'),(4,9,'x','x'),(3,NULL,'q','q'),
(3,NULL,'k','k'),(NULL,7,'l','l'),(35,182,'k','k'),(246,202,'a','a'),(1,7,'x','x'),(9,6,'j
','j'),(8,119,'z','z'),(5,4,'d','d'),(2,5,'h','h'),(4,1,'u','u'),(8,3,'q','q'),(3,7,'a','a
'),(7,3,'e','e'),(2,6,'l','l');
CREATE TABLE `B` (
  `int_nokey` int(11) DEFAULT NULL,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `B` VALUES (NULL,7,NULL,NULL),(9,4,'x','x');

 
SELECT DISTINCT MIN( `varchar_key`) AS field1 , `int_nokey` AS field2 , `int_key` AS
field3 
FROM CC AS table1 
WHERE (table1 . `varchar_nokey`  IN ( 
SELECT subquery_t1 . `varchar_key` 
FROM B AS subquery_t1 CROSS JOIN B AS subquery_t2 ON ( subquery_t2 . `int_key` >=
subquery_t1 . `int_key` )  ) )  
GROUP BY field2, field3;

# Diff:

# ---
/var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server0.dump	2
009-08-04 11:24:29.000000000 -0400
# +++
/var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server1.dump	2
009-08-04 11:24:29.000000000 -0400
# @@ -1,2 +0,0 @@
# -x	1	7
# -x	4	9

DROP TABLE CC, B;

# End of test case for query 1
[24 Nov 2009 8:18] Øystein Grøvlen
Query plan for query 0 above:

EXPLAIN EXTENDED SELECT vkey FROM C
WHERE pk IN
(SELECT t1.pk FROM C t1 JOIN C t2 ON t2.vkey = t1.vnokey);
id  select_type  table  type    possible_keys  key     key_len ref       rows  filtered  Extra
1   PRIMARY      C      ALL     PRIMARY        NULL    NULL    NULL      20    100.00
1   PRIMARY      t2     index   vkey           vkey    4       NULL      20    100.00    Using index; FirstMatch(C)
1   PRIMARY      t1     eq_ref  PRIMARY        PRIMARY 4       test.C.pk 1     100.00    Using where
Warnings:
Note   1003    select `test`.`C`.`vkey` AS `vkey` from `test`.`C` `t1` semi join (`test`.`C` `t2`) join `test`.`C` where ((`test`.`t1`.`vnokey` = `test`.`t2`.`vkey`) and (`test`.`t1`.`pk` = `test`.`C`.`pk`))
[24 Nov 2009 10:22] Øystein Grøvlen
If first match is off, the result is correct.

Otherwise, result seem to be very dependent on the data.  If I for test case 0 above, remove rows with VARCHAR fields that are NULL,  I get two 'd' rows as the result.  This is the first two rows when sorted alphabetically.  This is probably because the varchar_key index is used to scan t2.  If I remove the 'd' rows too, I get the two 'e' rows.

If table contains 3 or less rows another plan is selected and the result will be correct.

For 4 rows, only rows with the "smallest" letter are returned.  The query plan for for rows are not quite the same as above.  Join buffer is used on t1 instead of primary-key lookup. For 5 rows and above, the exact same query plan is used.
[24 Nov 2009 11:25] Øystein Grøvlen
It seems the scan on t2 stops when the first row does match anything in t1 for the current primary key of C.  This happens because the return_tab of t2 refers to C.  I suspect that it is t1 which should refer to C, and that it is the reordering of t1 and t2 that creates trouble.
[24 Nov 2009 14:56] Øystein Grøvlen
Table t1 will be pulled out in this case because of the primary-key correlation.
In that case, I would assume that t1 becomes an outer correlated table to t2.
If so, the chosen join sequence should not be a candidate for FirstMatch since 
it requires that all outer correlated tables comes before the inner tables.
[25 Nov 2009 8:51] Øystein Grøvlen
advance_sj_state consider a table to be an outer correlated table if it is contained in sj_nest->nested_join->sj_corr_tables or sj_nest->nested_join->sj_depends_on.  Neither of these sets contains t1 in this case.  Maybe these needs to be updated when a table is pulled out of sj_nest?
[26 Nov 2009 11:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91773

3734 oystein.grovlen@sun.com	2009-11-26
      Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on
      
      Problem: When a table is pulled out of a semijoin, it is not 
      considered whether this makes it a correlated outer table. If 
      it really is a correlated outer table, FirstMatch strategy
      requires that it comes before the semijoin-nest in the join 
      sequence.  Since this information is not recorded, this 
      restriction is not enforced.
      
      Solution: If a table is pulled out of semijoin nest due to 
      eq_ref, check whether it is used in the on_expr for the 
      semijoin nest.  If so, add it to the set of tables that are
      regarded as correlated to the semijoin nest.
     @ mysql-test/r/subselect_sj.result
        Result for test case for Bug#46566
     @ mysql-test/r/subselect_sj2.result
        Changed query plan due to the fix for Bug#46566.
        Changed some data in that test case to show that the previous query
        plan was wrong.
     @ mysql-test/r/subselect_sj2_jcl6.result
        Changed query plan due to the fix for Bug#46566.
        Changed some data to that test case to show that the previous query
        plan was wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ mysql-test/r/subselect_sj_jcl6.result
        Result for test case for Bug#46566.
     @ mysql-test/t/subselect_sj.test
        Added test case for Bug#46556.
     @ mysql-test/t/subselect_sj2.test
        The fix for Bug#46566 changes the query plan for a query.
        Changed some data in that test case to show that the previous query
        plan wass wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ sql/sql_select.cc
        If a table is pulled out of semijoin nest due to 
        eq_ref, include it in sj_nest->nested_join->sj_corr_tables
        if it is used in sj_nest->sj_on_expr.
        This makes sure it is considered an outer correlated table to
        this semijoin nest.
[1 Dec 2009 11:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92258

3744 oystein.grovlen@sun.com	2009-12-01
      Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on
      
      Problem: When a table is pulled out of a semijoin, it is not 
      considered whether this makes it a correlated outer table. If 
      it really is a correlated outer table, FirstMatch strategy
      requires that it comes before the semijoin-nest in the join 
      sequence.  Since this information is not recorded, this 
      restriction is not enforced.
      
      Solution: If a table is pulled out of semijoin nest due to 
      eq_ref, add it to the set of tables that are regarded as 
      correlated to the semijoin nest.
     @ mysql-test/r/subselect_sj.result
        Result for test case for Bug#46566
     @ mysql-test/r/subselect_sj2.result
        Changed query plan due to the fix for Bug#46566.
        Changed some data in that test case to show that the previous query
        plan was wrong.
     @ mysql-test/r/subselect_sj2_jcl6.result
        Changed query plan due to the fix for Bug#46566.
        Changed some data to that test case to show that the previous query
        plan was wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ mysql-test/r/subselect_sj_jcl6.result
        Result for test case for Bug#46566.
     @ mysql-test/t/subselect_sj.test
        Added test case for Bug#46556.
     @ mysql-test/t/subselect_sj2.test
        The fix for Bug#46566 changes the query plan for a query.
        Changed some data in that test case to show that the previous query
        plan wass wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ sql/sql_select.cc
        If a table is pulled out of semijoin nest due to eq_ref,
        include it in sj_nest->nested_join->sj_corr_tables and
        sj_nest->nested_join->sj_depends_on.  (It has been verified that
        this equivalent to what will happen if the table is pulled
        out manually by rewriting the query.)
        This makes sure the table is considered an outer correlated table 
        to this semijoin nest.
[11 Dec 2009 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93698

3768 oystein.grovlen@sun.com	2009-12-11
      Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on
      
      Problem: When a table is pulled out of a semijoin, it is not 
      considered whether this makes it a correlated outer table. If 
      it really is a correlated outer table, FirstMatch strategy
      requires that it comes before the semijoin-nest in the join 
      sequence.  Since this information is not recorded, this 
      restriction is not enforced.
      
      Solution: If a table is pulled out of semijoin nest due to 
      eq_ref, add it to the set of tables that are regarded as 
      correlated to the semijoin nest.
     @ mysql-test/r/subselect_sj.result
        Result for test case for Bug#46566
     @ mysql-test/r/subselect_sj2.result
        Changed query plan due to the fix for Bug#46556.
        Changed some data in that test case to show that the previous query
        plan was wrong.
     @ mysql-test/r/subselect_sj2_jcl6.result
        Changed query plan due to the fix for Bug#46556.
        Changed some data to that test case to show that the previous query
        plan was wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ mysql-test/r/subselect_sj_jcl6.result
        Result for test case for Bug#46556.
     @ mysql-test/t/subselect_sj.test
        Added test case for Bug#46556.
     @ mysql-test/t/subselect_sj2.test
        The fix for Bug#46556 changes the query plan for a query.
        Changed some data in that test case to show that the previous query
        plan wass wrong.
        Due to Bug#49129, the query will now return wrong result when
        setting join_cache_level=6
     @ sql/sql_select.cc
        If a table is pulled out of semijoin nest due to eq_ref,
        include it in sj_nest->nested_join->sj_corr_tables and
        sj_nest->nested_join->sj_depends_on.  (It has been verified that
        this equivalent to what will happen if the table is pulled
        out manually by rewriting the query.)
        This makes sure the table is considered an outer correlated table 
        to this semijoin nest.
[11 Dec 2009 14:26] Øystein Grøvlen
Patch pushed as revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0
[21 Dec 2009 15:39] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091221153807-80nxoli1tw1z9bxn) (version source revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0) (merge vers: 6.0.14-alpha) (pib:15)
[8 Jan 2010 21:11] Paul DuBois
Noted in 6.0.14 changelog.

With semijoin optimization enabled, some queries using IN subqueries
incorrectly returned empty results.
[7 Apr 2010 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105134

3066 oystein.grovlen@sun.com	2010-04-07
      Bug#46556 Returning incorrect, empty results for some IN subqueries w/ semijoin=on
      
      (Backporting of revid:oystein.grovlen@sun.com-20091211140923-q0uu1djdd5de4ce0)
      
      Problem: When a table is pulled out of a semijoin, it is not 
      considered whether this makes it a correlated outer table. If 
      it really is a correlated outer table, FirstMatch strategy
      requires that it comes before the semijoin-nest in the join 
      sequence.  Since this information is not recorded, this 
      restriction is not enforced.
      
      Solution: If a table is pulled out of semijoin nest due to 
      eq_ref, add it to the set of tables that are regarded as 
      correlated to the semijoin nest.
     @ mysql-test/r/subselect_sj.result
        Result for test case for Bug#46566
     @ mysql-test/r/subselect_sj2.result
        Changed query plan due to the fix for Bug#46556.
        Changed some data in that test case to show that the previous query
        plan was wrong.
     @ mysql-test/t/subselect_sj.test
        Added test case for Bug#46556.
     @ mysql-test/t/subselect_sj2.test
        The fix for Bug#46556 changes the query plan for a query.
        Changed some data in that test case to show that the previous query
        plan wass wrong.
     @ sql/sql_select.cc
        If a table is pulled out of semijoin nest due to eq_ref,
        include it in sj_nest->nested_join->sj_corr_tables and
        sj_nest->nested_join->sj_depends_on.  (It has been verified that
        this equivalent to what will happen if the table is pulled
        out manually by rewriting the query.)
        This makes sure the table is considered an outer correlated table 
        to this semijoin nest.
[16 Aug 2010 6:34] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:26] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 2:55] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.