Bug #54641 semijoin loosescan gives duplicate rows
Submitted: 20 Jun 2010 7:48 Modified: 11 Dec 2010 17:49
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:next-mr-opt-backporting OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any
Tags: join_cache_level, LooseScan, optimizer_switch, semijoin

[20 Jun 2010 7:48] Guilhem Bichot
Description:
The testcase of closed BUG#46550, when forced to pick semijoin Loosescan, gives wrong results. This was first observed in the diff of subquery_sj_none.result (correct) vs subquery_sj_loosescan.result (wrong) in next-mr-opt-backporting.

How to repeat:
# this is the testcase of
# Bug#46550 Azalea returning duplicate results for some IN subqueries
# w/ semijoin=on
# but here we use it to find a loosescan, different bug

DROP TABLE IF EXISTS t0, t1, t2;
CREATE TABLE t0 (
int_key int(11) DEFAULT NULL,
varchar_key varchar(1) DEFAULT NULL,
varchar_nokey varchar(1) DEFAULT NULL,
KEY int_key (int_key),
KEY varchar_key (varchar_key,int_key)
);
INSERT INTO t0 VALUES
(1,'m','m'),
(40,'h','h'),
(1,'r','r'),
(1,'h','h'),
(9,'x','x'),
(NULL,'q','q'),
(NULL,'k','k'),
(7,'l','l'),
(182,'k','k'),
(202,'a','a'),
(7,'x','x'),
(6,'j','j'),
(119,'z','z'),
(4,'d','d'),
(5,'h','h'),
(1,'u','u'),
(3,'q','q'),
(7,'a','a'),
(3,'e','e'),
(6,'l','l');
CREATE TABLE t1 (
int_key int(11) DEFAULT NULL,
varchar_key varchar(1) DEFAULT NULL,
varchar_nokey varchar(1) DEFAULT NULL,
KEY int_key (int_key),
KEY varchar_key (varchar_key,int_key)
);
INSERT INTO t1 VALUES (7,NULL,NULL),(4,'x','x');
CREATE TABLE t2 (
int_key int(11) DEFAULT NULL,
varchar_key varchar(1) DEFAULT NULL,
varchar_nokey varchar(1) DEFAULT NULL,
KEY int_key (int_key),
KEY varchar_key (varchar_key,int_key)
);
INSERT INTO t2 VALUES (123,NULL,NULL);
explain SELECT t0.int_key  
FROM t0
WHERE t0.varchar_nokey  IN (  
SELECT t1_1 .varchar_key  
FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key
);
SELECT t0.int_key  
FROM t0
WHERE t0.varchar_nokey  IN (  
SELECT t1_1 .varchar_key  
FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key
);
explain SELECT t0.int_key  
FROM t0, t2
WHERE t0.varchar_nokey  IN (  
SELECT t1_1 .varchar_key  
FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key  
);
SELECT t0.int_key  
FROM t0, t2
WHERE t0.varchar_nokey  IN (  
SELECT t1_1 .varchar_key  
FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key  
);
DROP TABLE t0, t1, t2;

When run with no specific options, semijoin materialization is used and both queries return (9),(7). When run with
--mysqld=--optimizer-switch=materialization=off
semijoin loosescan is used and results are wrong for both queries:
@@ -54,8 +54,8 @@
 FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key
 );
 id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
-1	PRIMARY	t1_1	index	varchar_key	varchar_key	9	NULL	2	Using where; Using index; Start materialize; Scan
-1	PRIMARY	t1_2	index	NULL	int_key	5	NULL	2	Using index; End materialize
+1	PRIMARY	t1_1	index	varchar_key	varchar_key	9	NULL	2	Using where; Using index; LooseScan
+1	PRIMARY	t1_2	index	NULL	int_key	5	NULL	2	Using index; FirstMatch(t1_1); Using join buffer (BNL, regular buffers)
 1	PRIMARY	t0	ALL	NULL	NULL	NULL	NULL	20	Using where; Using join buffer (BNL, regular buffers)
 SELECT t0.int_key  
 FROM t0
@@ -65,6 +65,8 @@
 );
 int_key
 9
+9
+7
 7
 explain SELECT t0.int_key  
 FROM t0, t2

(9),(7) is correct because the subquery should return 'x' (possibly several times but it doesn't matter for IN()), and then there should be one result row per row of t0 having its varchar_nokey equal to 'x': 9 and 7.
A comment marking this wrong result was put in subquery_sj.inc.
[20 Jun 2010 10:01] Valeriy Kravchuk
Verified just as described:

valeriy-kravchuks-macbook-pro:opt-bakporting openxs$ bin/mysql -uroot testReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.6.99-m4-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show variables like 'optimizer_switch'\G
*************************** 1. row ***************************
Variable_name: optimizer_switch
        Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,materialization=off,semijoin=on,loosescan=on,firstmatch=on,mrr=on,mrr_cost_based=off,index_condition_pushdown=on
1 row in set (0.00 sec)

mysql> SELECT t0.int_key  
    -> FROM t0
    -> WHERE t0.varchar_nokey  IN (  
    -> SELECT t1_1 .varchar_key  
    -> FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key
    -> );
+---------+
| int_key |
+---------+
|       9 |
|       9 |
|       7 |
|       7 |
+---------+
4 rows in set (0.00 sec)

mysql> SELECT t0.int_key  
    -> FROM t0, t2
    -> WHERE t0.varchar_nokey  IN (  
    -> SELECT t1_1 .varchar_key  
    -> FROM t1 AS t1_1  JOIN t1 AS t1_2 ON t1_1 .int_key  
    -> );
+---------+
| int_key |
+---------+
|       9 |
|       9 |
|       7 |
|       7 |
+---------+
4 rows in set (0.01 sec)
[13 Jul 2010 13:21] Roy Lyseng
The result is correct when join caching is disabled. Changing tags accordingly.
[4 Nov 2010 9:14] Jørgen Løland
Simplified test case that makes inner/outer tables explicit: 

CREATE TABLE ot (
oti int(11) NOT NULL,
otv char(1) NOT NULL
);
INSERT INTO ot VALUES
(1,'x'),
(2,'x'),
(3,'a');
CREATE TABLE it1 (
it1i int(11) NOT NULL,
it1v char(1) NOT NULL,
KEY char_key (it1v,it1i)
);
INSERT INTO it1 VALUES (4,'b'),(5,'x');
CREATE TABLE it2 (
it2i int(11) NOT NULL,
it2v char(1) NOT NULL,
KEY char_key (it2v,it2i)
);
INSERT INTO it2 VALUES (4,'b'),(5,'x');

set optimizer_switch='materialization=off';

explain SELECT oti
FROM ot
WHERE otv IN (  
SELECT it1.it1v  
FROM it1 JOIN it2 ON it1.it1i
);
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	it1	index	char_key	char_key	5	NULL	#	Using where; Using index; LooseScan
1	PRIMARY	it2	index	NULL	char_key	5	NULL	#	Using index; FirstMatch(it1); Using join buffer (BNL, regular buffers)
1	PRIMARY	ot	ALL	NULL	NULL	NULL	NULL	#	Using where; Using join buffer (BNL, regular buffers)
[19 Nov 2010 12:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124431

3287 Jorgen Loland	2010-11-19
      Bug#54641: semijoin loosescan gives duplicate rows
      
      The loosescan semijoin strategy does not handle join buffering 
      on any of the tables in the loosescan range. Prior to this bug
      fix, check_join_cache_usage() would not do join buffering on a 
      join table if the loosescan_match_tab pointer was set for it. 
      
      However, loosescan_match_tab is a pointer from the first to the 
      last join table handled by the loosescan strategy. Join 
      buffering could therefore be used on all join tables except 
      the first in the loosescan range.
      
      The fix is to skip join buffering if the join table is in 
      the loosescan range, i.e., if
      join_tab->get_sj_strategy() == SJ_OPT_LOOSE_SCAN,
      instead of checking loosescan_match_tab.
     @ mysql-test/r/subquery_sj_loosescan.result
        Updated test result after fixing BUG#54641
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Updated test result after fixing BUG#54641
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Updated test result after fixing BUG#54641
     @ sql/sql_select.cc
        In check_join_cache_usage: skip join buffering if the join
        table is in the loosescan range. This was previously done by 
        checking if the join table's loosescan_match_tab was set, but
        is now done by checking if it's semijoin strategy is
        SJ_OPT_LOOSE_SCAN.
[24 Nov 2010 17:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124879

3287 Jorgen Loland	2010-11-24
      Bug#54641: semijoin loosescan gives duplicate rows
      
      The loosescan semijoin strategy does not handle join buffering 
      on any of the tables in the loosescan range. Prior to this bug
      fix, check_join_cache_usage() would not do join buffering on a 
      join table if the loosescan_match_tab pointer was set for it. 
      
      However, loosescan_match_tab is a pointer from the first to the 
      last join table handled by the loosescan strategy. Join 
      buffering could therefore be used on all join tables except 
      the first in the loosescan range.
      
      The fix is to skip join buffering if the join table is in 
      the loosescan range, i.e., if
      join_tab->get_sj_strategy() == SJ_OPT_LOOSE_SCAN,
      instead of checking loosescan_match_tab.
     @ mysql-test/include/subquery_sj.inc
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_all.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_all_jcl6.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_all_jcl7.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_dupsweed.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_dupsweed_jcl6.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_dupsweed_jcl7.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_firstmatch.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_firstmatch_jcl6.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_firstmatch_jcl7.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_loosescan.result
        Updated test result after fixing BUG#54641
     @ mysql-test/r/subquery_sj_loosescan_jcl6.result
        Updated test result after fixing BUG#54641
     @ mysql-test/r/subquery_sj_loosescan_jcl7.result
        Updated test result after fixing BUG#54641
     @ mysql-test/r/subquery_sj_mat.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_mat_jcl6.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_mat_jcl7.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_mat_nosj.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_none.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_none_jcl6.result
        Add EXPLAIN to query that failed in bug#54641
     @ mysql-test/r/subquery_sj_none_jcl7.result
        Add EXPLAIN to query that failed in bug#54641
     @ sql/sql_select.cc
        In check_join_cache_usage: skip join buffering if the join
        table is in the loosescan range. This was previously done by 
        checking if the join table's loosescan_match_tab was set, but
        is now done by checking if it's semijoin strategy is
        SJ_OPT_LOOSE_SCAN.
[5 Dec 2010 12:40] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (version source revid:alexander.nozdrin@oracle.com-20101205122447-6x94l4fmslpbttxj) (merge vers: 5.6.1) (pib:23)
[11 Dec 2010 17:49] Paul DuBois
Bug does not appear in any released 5.6.x version. No changelog entry needed.