Bug #106710 | Semijoin may give wrong result | ||
---|---|---|---|
Submitted: | 11 Mar 2022 14:54 | Modified: | 24 Mar 2022 22:16 |
Reporter: | Shanshan Ying | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 8.0, 8.0.28 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Optimizer, semijoin |
[11 Mar 2022 14:54]
Shanshan Ying
[11 Mar 2022 15:08]
MySQL Verification Team
Hello Shanshan Ying, Thank you for the report and test case. regards, Umesh
[24 Mar 2022 13:21]
huahua xu
Hi, Shanshan Ying, By analyzing the preparation and optimization stage of this query statement, I find that: 1. after the subquery is executed using a semi-join materialization, the where condition is transformed several injected semi-join equalities: ->Item_func_eq ->left: Item_in_optimizer(t2.c0 NOT IN ( SELECT t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )) ->right: Item_field(t2.c0) ->... 2. For each injected semi-join equality, replace the reference to the expression selected from the subquery with the corresponding column in the temporary table. so, the right argument will be replaced with the corresponding column from the materialized temporary table. The first equation above is converted to: Item_func_eq ->left: Item_in_optimizer(t2.c0 NOT IN ( SELECT t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )) ->right: Item_field(`<subquery2>`.c0) 3. When attaching WHERE clause and join conditions to the tables, The table t2 missed the condition((t2.c0 NOT IN ( SELECT t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 )) = (t2.c0)). because the condition is not the most appropriate for table t2, but `<subquery2>`. You can refer to the function Item_func_eq::equality_substitution_transformer and make_join_query_block. for example, the query should be correct: SELECT t1.c0 AS ref0 FROM t1 WHERE t1.c0 IN ( SELECT t2.c0 AS ref1 FROM t1 as t2 WHERE (t2.c0)= (t2.c0 NOT IN ( SELECT t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 ))); mysql> explain format=tree SELECT t1.c0 AS ref0 FROM t1 WHERE t1.c0 IN ( SELECT t2.c0 AS ref1 FROM t1 as t2 WHERE (t2.c0)= (t2.c0 NOT IN ( SELECT t3.c0 AS ref2 FROM t1 as t3 WHERE t3.c0 ))); +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Inner hash join (t1.c0 = `<subquery2>`.ref1) (cost=3.81 rows=1) -> Table scan on t1 (cost=0.35 rows=6) -> Hash -> Table scan on <subquery2> (cost=2.51..2.51 rows=1) -> Materialize with deduplication (cost=2.96..2.96 rows=1) -> Filter: (t2.c0 is not null) (cost=0.35 rows=1) -> Filter: (t2.c0 = <in_optimizer>(t2.c0,t2.c0 not in (select #3))) (cost=0.35 rows=1) -> Table scan on t2 (cost=0.35 rows=6) -> Select #3 (subquery in condition; run only once) -> Filter: ((t2.c0 = `<materialized_subquery>`.ref2)) (cost=1.35..1.35 rows=5) -> Limit: 1 row(s) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (ref2=t2.c0) -> Materialize with deduplication (cost=1.35..1.35 rows=5) -> Filter: (0 <> t3.c0) (cost=0.85 rows=5) -> Table scan on t3 (cost=0.85 rows=6) | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
[24 Mar 2022 22:16]
Jon Stephens
Documented fix as follows in the MySQL 8.0.30 changelog: In certain cases, incorrect results could result from execution of a semijoin with materialization, when the WHERE clause of the subquery contained an equality. In some cases, such as when one side of the equality was an IN or NOT IN subquery, the equality was neither pushed down to the materialized subquery, nor evaluated as part of the semijoin. Closed.
[24 Mar 2022 22:24]
Jon Stephens
See also BUG#106718.