Bug #115652 ANTI JOIN returns incorrect results
Submitted: 20 Jul 2024 4:04 Modified: 22 Jul 2024 12:52
Reporter: Chenglin Liang Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:9.0.0, 8.4.1, 8.0.38 OS:Ubuntu (Ubuntu 20.04.6 LTS)
Assigned to: CPU Architecture:x86 (5.15.0-113-generic)
Tags: anti join, innodb

[20 Jul 2024 4:04] Chenglin Liang
Description:
When I use ANTI JOIN to join two tables, I get the wrong result.

How to repeat:
CREATE TABLE t0(c0 CHAR(100)  NOT NULL, PRIMARY KEY(c0));
CREATE TABLE t1(c0 VARCHAR(100) );
INSERT INTO t1 VALUES ('B');
INSERT INTO t0 VALUES ('bྃ');
INSERT INTO t1 VALUES ('gIG');

SELECT * FROM t0; -- get [bྃ]
SELECT * FROM t1; -- get [B, gIG]

SELECT t1.c0 FROM t1  WHERE NOT EXISTS (SELECT 1 FROM t0 WHERE t1.c0 = t0.c0); -- get [gIG], expect [B, gIG]

Suggested fix:
explain format=tree  SELECT t1.c0 FROM t1  WHERE NOT EXISTS (SELECT 1 FROM t0 WHERE t1.c0 = t0.c0); 
EXPLAIN
-> Nested loop antijoin  (cost=1.15 rows=2)
    -> Table scan on t1  (cost=0.45 rows=2)
    -> Filter: (t1.c0 = t0.c0)  (cost=0.3 rows=1)
        -> Single-row covering index lookup on t0 using PRIMARY (c0=t1.c0)  (cost=0.3 rows=1)
[22 Jul 2024 7:25] MySQL Verification Team
Hello Liang Chenglin,

Thank you for the report and test case.

regards,
Umesh
[22 Jul 2024 12:52] Bernt Marius Johnsen
The character following the b in t0.c0 is an ignorable character.
U+0F83 TIBETAN SIGN SNA LDAN, the weight is zero. Thus:

mysql> select 'bྃ' = 'B';
+--------------+
| 'bྃ' = 'B'   |
+--------------+
|            1 |
+--------------+
1 row in set (0,00 sec)