Bug #46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
Submitted: 4 Aug 2009 16:54 Modified: 23 Nov 2010 2:54
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Tor Didriksen CPU Architecture:Any
Tags: IN, optimizer_switch, semijoin, subquery

[4 Aug 2009 16:54] Patrick Crews
Description:
Azalea is returning duplicate results (correct results are repeated unnecessarily) for some IN subqueries with optimizer_switch's semijoin setting = on

From the attached test case:
SELECT `int_key`  
FROM CC  
WHERE `varchar_nokey`  IN (  
SELECT subquery_t1 .`varchar_key`  
FROM B subquery_t1  JOIN B  ON subquery_t1 .`int_key`  )   ;

5.1 results:
int_key
9
7

Azalea results (w/ semijoin on):
int_key
9
9
7
7

How to repeat:
Run the following test case.
Use the appropriate Server0 / Server1 settings according to which version of the server is being tested.

Toggle the optimizer_switch semijoin setting for Server1 (azalea) and observe the change in result sets:

# Server0: version = 5.1.38-debug-log
# Server1: version = 5.4.4-alpha-debug-log

# The value of optimizer_switch is distinct between the two servers:
# Server 0 : SET SESSION optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on';
# Server 1 : SET SESSION optimizer_switch = 'firstmatch=on,index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,loosescan=on,materialization=on,semijoin=on';

# The value of optimizer_use_mrr is distinct between the two servers:
# Server 0 : SET SESSION optimizer_use_mrr = '';
# Server 1 : SET SESSION optimizer_use_mrr = 'force';

# The value of engine_condition_pushdown is common between the two servers:
SET SESSION engine_condition_pushdown = ON ;

# The value of join_cache_level is distinct between the two servers:
# Server 0 : SET SESSION join_cache_level = '';
# Server 1 : SET SESSION join_cache_level = 1;

# Begin test case for query 0

--disable_warnings
DROP TABLE IF EXISTS CC, B;
--enable_warnings

CREATE TABLE `CC` (
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (1,'m','m'),(40,'h','h'),(1,'r','r'),(1,'h','h'),(9,'x','x'),(NULL,'q','q'),(NULL,'k','k'),(7,'l','l'),(182,'k','k'),(202,'a','a'),(7,'x','x'),(6,'j','j'),(119,'z','z'),(4,'d','d'),(5,'h','h'),(1,'u','u'),(3,'q','q'),(7,'a','a'),(3,'e','e'),(6,'l','l');
CREATE TABLE `B` (
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `B` VALUES (7,NULL,NULL),(4,'x','x');

 
SELECT `int_key`  
FROM CC  
WHERE `varchar_nokey`  IN (  
SELECT subquery_t1 .`varchar_key`  
FROM B subquery_t1  JOIN B  ON subquery_t1 .`int_key`  )   ;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399467-server0.dump	2009-08-04 11:24:27.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399467-server1.dump	2009-08-04 11:24:27.000000000 -0400
# @@ -1,2 +1,4 @@
#  7
# +7
# +9
#  9

DROP TABLE CC, B;

# End of test case for query 0

# Begin test case for query 1

--disable_warnings
DROP TABLE IF EXISTS CC, B;
--enable_warnings

CREATE TABLE `CC` (
  `int_nokey` int(11) DEFAULT NULL,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `CC` VALUES (3,1,'m','m'),(196,40,'h','h'),(4,1,'r','r'),(0,1,'h','h'),(4,9,'x','x'),(3,NULL,'q','q'),(3,NULL,'k','k'),(NULL,7,'l','l'),(35,182,'k','k'),(246,202,'a','a'),(1,7,'x','x'),(9,6,'j','j'),(8,119,'z','z'),(5,4,'d','d'),(2,5,'h','h'),(4,1,'u','u'),(8,3,'q','q'),(3,7,'a','a'),(7,3,'e','e'),(2,6,'l','l');
CREATE TABLE `B` (
  `int_nokey` int(11) DEFAULT NULL,
  `int_key` int(11) DEFAULT NULL,
  `varchar_key` varchar(1) DEFAULT NULL,
  `varchar_nokey` varchar(1) DEFAULT NULL,
  KEY `int_key` (`int_key`),
  KEY `varchar_key` (`varchar_key`,`int_key`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `B` VALUES (NULL,7,NULL,NULL),(9,4,'x','x');

 
SELECT DISTINCT MIN( `varchar_key`) AS field1 , `int_nokey` AS field2 , `int_key` AS field3 
FROM CC AS table1 
WHERE (table1 . `varchar_nokey`  IN ( 
SELECT subquery_t1 . `varchar_key` 
FROM B AS subquery_t1 CROSS JOIN B AS subquery_t2 ON ( subquery_t2 . `int_key` >= subquery_t1 . `int_key` )  ) )  
GROUP BY field2, field3;

# Diff:

# --- /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server0.dump	2009-08-04 11:24:29.000000000 -0400
# +++ /var/folders/Pt/PtJd7NDTGgyOk3+iDGXrQk+++TI/-Tmp-///randgen69966-1249399469-server1.dump	2009-08-04 11:24:29.000000000 -0400
# @@ -1,2 +0,0 @@
# -x	1	7
# -x	4	9

DROP TABLE CC, B;

# End of test case for query 1

Suggested fix:
Ensure correct processing of queries regardless of optimizer_switch settings.
[10 Nov 2009 10:18] Tor Didriksen
The second query has a different symptom, and is not the same bug as the first one. The first query generates duplicates, because we use a wrong execution plan (one which does not remove duplicates). The second query generates an empty result. Splitting this into two separate bugs.
[10 Nov 2009 14:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89969

3703 Tor Didriksen	2009-11-10
      Bug#46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
      
      The optimizer was choosing an illegal plan for FirstMatch.
      Added an explicit check that the join prefix satisfies this pattern:
      (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
     @ mysql-test/r/subselect_sj.result
        Add test case.
     @ mysql-test/r/subselect_sj_jcl6.result
        Add test case.
     @ mysql-test/t/subselect_sj.test
        Add test case.
     @ sql/sql_select.cc
        Added an explicit check that the join prefix satisfies this pattern:
        (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
        
        Some cosmetics for readability and coding style compliance.
     @ sql/sql_select.h
        Make sj_strategy private, with getter and setter.
[12 Nov 2009 12:41] Tor Didriksen
Actually, the second query here looks similar to
http://bugs.mysql.com/bug.php?id=46556
Azalea returning incorrect, empty results for some IN subqueries w/ semijoin=on
[13 Nov 2009 11:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90338

3711 Tor Didriksen	2009-11-13
      Bug#46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
      
      The optimizer was choosing an illegal plan for FirstMatch.
      Added an explicit check that the join prefix satisfies this pattern:
      (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
     @ mysql-test/r/subselect_sj.result
        Add test case.
     @ mysql-test/r/subselect_sj_jcl6.result
        Add test case.
     @ mysql-test/t/subselect_sj.test
        Add test case.
     @ sql/sql_select.cc
        Added an explicit check that the join prefix satisfies this pattern:
        (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
        
        Some cosmetics for readability and coding style compliance.
[13 Nov 2009 14:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90361

3713 Tor Didriksen	2009-11-13
      Bug#46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
      
      The optimizer was choosing an illegal plan for FirstMatch.
      Added an explicit check that the join prefix satisfies this pattern:
      (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
     @ mysql-test/r/subselect_sj.result
        Add test case.
     @ mysql-test/r/subselect_sj_jcl6.result
        Add test case.
     @ mysql-test/t/subselect_sj.test
        Add test case.
     @ sql/sql_select.cc
        Added an explicit check that the join prefix satisfies this pattern:
        (ot|nt)*  [ it ((it|nt)* it) ]  (nt)*
        
        Some cosmetics for readability and coding style compliance.
[17 Nov 2009 10:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90635

3712 Tor Didriksen	2009-11-17
      Bug#46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
      
      The optimizer was choosing an illegal plan for FirstMatch.
      Add an explicit check that all inner tables are part of 'remaining_tables'
      when we initiate the FirstMatch algorithm.
     @ mysql-test/r/subselect_sj.result
        Add test case.
     @ mysql-test/r/subselect_sj_jcl6.result
        Add test case.
     @ mysql-test/t/subselect_sj.test
        Add test case.
     @ sql/sql_select.cc
        Add an explicit check that all inner tables are part of 'remaining_tables'
        when we initiate the FirstMatch algorithm.
        
        Some cosmetics for readability and coding style compliance.
[19 Nov 2009 14:38] Tor Didriksen
Pushed to:
bzr+ssh://bk-internal.mysql.com/bzrroot/server/mysql-6.0-codebase-bugfixing/
revision-id: tor.didriksen@sun.com-20091117101207-y3779rzw50cm2emc
[20 Nov 2009 12:56] Bugs System
Pushed into 6.0.14-alpha (revid:kostja@sun.com-20091120124947-yi6h2jbgw0kbciwm) (version source revid:tor.didriksen@sun.com-20091117101207-y3779rzw50cm2emc) (merge vers: 6.0.14-alpha) (pib:13)
[23 Nov 2009 20:14] Paul DuBois
Noted in 6.0.14 changelog.

With semijoin optimization enabled, the optimizer sometimes chose an
illegal execution plan for the FirstMatch strategy, resulting in
duplicate values from IN subqueries.
[13 Apr 2010 14:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105520

3072 Tor Didriksen	2010-04-13
      Bug#46550 Azalea returning duplicate results for some IN subqueries w/ semijoin=on
      
      Backport of tor.didriksen@sun.com-20091117101207-y3779rzw50cm2emc
      
      The optimizer was choosing an illegal plan for FirstMatch.
      Add an explicit check that all inner tables are part of 'remaining_tables'
      when we initiate the FirstMatch algorithm.
     @ mysql-test/r/subselect_sj.result
        Add test case.
     @ mysql-test/t/subselect_sj.test
        Add test case.
     @ sql/sql_select.cc
        Add an explicit check that all inner tables are part of 'remaining_tables'
        when we initiate the FirstMatch algorithm.
        .
        Some cosmetics for readability and coding style compliance.
[16 Aug 2010 6:34] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:23] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 2:54] Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.