Description:
It could be duplicate of something in:
https://bugs.mysql.com/search.php?search_for=materialization&status=Active&severity=&limit...
but I'm not sure because, contrary to the bugs above, my query has the subquery in the JOIN ON condition.
Verified in 8.0.29 and 8.0.18.
Please, run the mtr test in "how to repeat".
Results:
SELECT /*+ join_prefix(t1,t2) */ t1.id, t2.id
FROM t1 LEFT JOIN t2 ON t1.id = t2.id2 AND
t2.id IN (SELECT /*+ semijoin(materialization) */ id3 FROM t3 WHERE id4 = 100 AND val = 2) ;
id id
1 3
SELECT /*+ join_prefix(t1,t2) */ t1.id, t2.id
FROM t1 LEFT JOIN t2 ON t1.id = t2.id2 AND
t2.id IN (SELECT /*+ no_semijoin(materialization) */ id3 FROM t3 WHERE id4 = 100 AND val = 2) ;
id id
1 NULL
the first result is wrong.
How to repeat:
CREATE TABLE t1 (
id int NOT NULL
);
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (
id int NOT NULL,
id2 int NOT NULL
);
INSERT INTO t2 VALUES (3,1);
CREATE TABLE t3 (
id3 int NOT NULL,
id4 int NOT NULL,
val int NOT NULL
);
INSERT INTO t3 VALUES (3,100,4);
analyze table t1,t2,t3;
SELECT /*+ join_prefix(t1,t2) */ t1.id, t2.id
FROM t1 LEFT JOIN t2 ON t1.id = t2.id2 AND
t2.id IN (SELECT /*+ semijoin(materialization) */ id3 FROM t3 WHERE id4 = 100 AND val = 2) ;
SELECT /*+ join_prefix(t1,t2) */ t1.id, t2.id
FROM t1 LEFT JOIN t2 ON t1.id = t2.id2 AND
t2.id IN (SELECT /*+ no_semijoin(materialization) */ id3 FROM t3 WHERE id4 = 100 AND val = 2) ;
drop table t1,t2,t3;
Suggested fix:
Looking at EXPLAIN of the wrong query:
explain format=tree SELECT /*+ join_prefix(t1,t2) */ t1.id, t2.id
FROM t1 LEFT JOIN t2 ON t1.id = t2.id2 AND
t2.id IN (SELECT /*+ semijoin(materialization) */ id3 FROM t3 WHERE id4 = 100 AND val = 2) ;
EXPLAIN
-> Nested loop left join (cost=0.70 rows=1)
-> Table scan on t1 (cost=0.35 rows=1)
-> Nested loop inner join (cost=0.55 rows=1)
-> Filter: (t2.id2 = t1.id) (cost=0.35 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (id3=t2.id)
-> Materialize with deduplication (cost=0.45..0.45 rows=1)
-> Filter: (t3.id3 is not null) (cost=0.35 rows=1)
-> Table scan on t3 (cost=0.35 rows=1)
you can see that "val=2" is nowhere in EXPLAIN. Apparently it has been lost, hence the false matches. There is actually no row with val=2 in t3.