Bug #106710 Semijoin may give wrong result
Submitted: 11 Mar 2022 14:54 Modified: 24 Mar 2022 22:16
Reporter: Shanshan Ying Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0, 8.0.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: Optimizer, semijoin

[11 Mar 2022 14:54] Shanshan Ying
Description:
Semijoin gives wrong reuslts. (with or without semijoin will give different results)

How to repeat:
======================================
create table
======================================
drop table if exists t1;
CREATE TABLE IF NOT EXISTS t1(c0 int ) ;
INSERT  INTO t1 VALUES(NULL), (1);
INSERT  INTO t1 VALUES(NULL), (2);
INSERT  INTO t1 VALUES(NULL), (3);

======================================
set semijoin off
======================================

SET optimizer_switch='semijoin=off';
SELECT  t1.c0 AS ref0
FROM t1
WHERE t1.c0 IN (
SELECT  t2.c0 AS ref1
FROM t1 as t2
WHERE (t2.c0 NOT IN (
SELECT  t3.c0 AS ref2
FROM t1 as t3
WHERE t3.c0 )) = (t2.c0));
-- empty result 

======================================
set semijoin on
======================================
SET optimizer_switch='semijoin=on';
SELECT  t1.c0 AS ref0
FROM t1
WHERE t1.c0 IN (
SELECT  t2.c0 AS ref1
FROM t1 as t2
WHERE (t2.c0 NOT IN (
SELECT  t3.c0 AS ref2
FROM t1 as t3
WHERE t3.c0 )) = (t2.c0));

-- return 3 rows
+------+
| ref0 |
+------+
|    1 |
|    2 |
|    3 |
+------+
3 rows in set (0.00 sec)

======================================
check explain plan
======================================
Besides, with semijoin=on, the plan explaiend is not completed

mysql> explain format=tree SELECT  t1.c0 AS ref0 FROM t1 WHERE t1.c0 IN ( SELECT  t2.c0 AS ref1 FROM t1 as t2 WHERE (t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )) =
(t2.c0));
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t1.c0 = `<subquery2>`.ref1)  (cost=7.38 rows=6)
    -> Table scan on t1  (cost=0.35 rows=6)
    -> Hash
        -> Table scan on <subquery2>  (cost=0.43..2.58 rows=6)
            -> Materialize with deduplication  (cost=1.38..3.53 rows=6)
                -> Filter: (t2.c0 is not null)  (cost=0.35 rows=6)
                    -> Table scan on t2  (cost=0.35 rows=6)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Besides, plan for subquery involvoing `t3` can be found nowhere.
[11 Mar 2022 15:08] MySQL Verification Team
Hello Shanshan Ying,

Thank you for the report and test case.

regards,
Umesh
[24 Mar 2022 13:21] huahua xu
Hi, Shanshan Ying,

By analyzing the preparation and optimization stage of this query statement, I find that:

1. after the subquery is executed using a semi-join materialization, the where condition is transformed several injected semi-join equalities:
     ->Item_func_eq
         ->left: Item_in_optimizer(t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 ))
	 ->right: Item_field(t2.c0)
     ->...

2. For each injected semi-join equality, replace the reference to the expression selected from the subquery with the corresponding column in the temporary table. so, the right argument will be replaced with the corresponding column from the materialized temporary table. The first equation above is converted to: 
      Item_func_eq
         ->left: Item_in_optimizer(t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 ))
	 ->right: Item_field(`<subquery2>`.c0)

3. When attaching WHERE clause and join conditions to the tables, The table t2 missed the condition((t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )) = (t2.c0)). because the condition is not the most appropriate for table t2, but `<subquery2>`.

You can refer to the function Item_func_eq::equality_substitution_transformer and make_join_query_block.

for example, the query should be correct: SELECT  t1.c0 AS ref0 FROM t1 WHERE t1.c0 IN ( SELECT  t2.c0 AS ref1 FROM t1 as t2 WHERE (t2.c0)= (t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )));

mysql> explain format=tree SELECT  t1.c0 AS ref0 FROM t1 WHERE t1.c0 IN ( SELECT  t2.c0 AS ref1 FROM t1 as t2 WHERE (t2.c0)= (t2.c0 NOT IN ( SELECT  t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t1.c0 = `<subquery2>`.ref1)  (cost=3.81 rows=1)
    -> Table scan on t1  (cost=0.35 rows=6)
    -> Hash
        -> Table scan on <subquery2>  (cost=2.51..2.51 rows=1)
            -> Materialize with deduplication  (cost=2.96..2.96 rows=1)
                -> Filter: (t2.c0 is not null)  (cost=0.35 rows=1)
                    -> Filter: (t2.c0 = <in_optimizer>(t2.c0,t2.c0 not in (select #3)))  (cost=0.35 rows=1)
                        -> Table scan on t2  (cost=0.35 rows=6)
                        -> Select #3 (subquery in condition; run only once)
                            -> Filter: ((t2.c0 = `<materialized_subquery>`.ref2))  (cost=1.35..1.35 rows=5)
                                -> Limit: 1 row(s)
                                    -> Index lookup on <materialized_subquery> using <auto_distinct_key> (ref2=t2.c0)
                                        -> Materialize with deduplication  (cost=1.35..1.35 rows=5)
                                            -> Filter: (0 <> t3.c0)  (cost=0.85 rows=5)
                                                -> Table scan on t3  (cost=0.85 rows=6)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
[24 Mar 2022 22:16] Jon Stephens
Documented fix as follows in the MySQL 8.0.30 changelog:

    In certain cases, incorrect results could result from execution
    of a semijoin with materialization, when the WHERE clause of the
    subquery contained an equality. In some cases, such as when one
    side of the equality was an IN or NOT IN subquery, the equality
    was neither pushed down to the materialized subquery, nor
    evaluated as part of the semijoin.

Closed.
[24 Mar 2022 22:24] Jon Stephens
See also BUG#106718.