Bug #117667 | Both INNER JOIN and SEMI JOIN are contradictory | ||
---|---|---|---|
Submitted: | 11 Mar 12:57 | Modified: | 11 Mar 16:05 |
Reporter: | jinhui lai | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
Version: | 9.2.0, 9.1.0, 8.0.41, 8.4.4 | OS: | Ubuntu (22.04) |
Assigned to: | CPU Architecture: | Any |
[11 Mar 12:57]
jinhui lai
[11 Mar 13:25]
MySQL Verification Team
Hello jinhui lai, Thank you for the report and test case. Verified as described. regards, Umesh
[11 Mar 15:07]
Roy Lyseng
Posted by developer: I am not sure what is claimed to be a bug here, but the default collation of MySQL is case insensitive. This means that e.g 'A' and 'a' compare as equal, and when selecting a DISTINCT value from a set of values including these, it is arbitrary what the result will be.
[11 Mar 16:05]
jinhui lai
Dear Roy, Thank you for your response. While I understand the default collation is case-insensitive, the issue lies in the logical consistency of the query results. For example: In this case, 'SELECT DISTINCT t0.c0 FROM t0 INNER JOIN t1 ON t0.c0 != t1.c0' returns 'a', but under case-insensitive rules, 'A' and 'a' are equal, so neither should satisfy '!= 'if t1 contains 'a'. In fact, according to what you said, this query should return an empty set. SELECT DISTINCT t0.c0 FROM t0 INNER JOIN t1 ON t0.c0 != t1.c0; +------+ | c0 | +------+ | a | +------+ Similarly, 'SELECT DISTINCT t0.c0 FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t0.c0 != t1.c0);' returns 'A' instead of 'a', which also contradicts the expected logic. SELECT DISTINCT t0.c0 FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t0.c0 != t1.c0); +------+ | c0 | +------+ | a | +------+ Furthermore, when i do not insert ‘b’, the result changes, which i consider to be too arbitrary. CREATE TABLE IF NOT EXISTS t0(c0 CHAR(1), c1 CHAR(1)) ; CREATE TABLE IF NOT EXISTS t1 LIKE t0; INSERT INTO t0(c1) VALUES(NULL); UPDATE t0 SET c0='A'; INSERT INTO t0(c0) VALUES('a'); INSERT INTO t1(c0) VALUES('a'); --not insert value 'b' SELECT DISTINCT t0.c0 FROM t0; +------+ | c0 | +------+ | A | +------+ SELECT DISTINCT t0.c0 FROM t0 INNER JOIN t1 ON t0.c0 != t1.c0; +------+ | c0 | +------+ | A | +------+ SELECT DISTINCT t0.c0 FROM t0 INNER JOIN t1 ON t0.c0 = t1.c0; +------+ | c0 | +------+ | A | +------+ SELECT DISTINCT t0.c0 FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t0.c0 != t1.c0); +------+ | c0 | +------+ | A | +------+ SELECT DISTINCT t0.c0 FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t0.c0 = t1.c0); +------+ | c0 | +------+ | A | +------+ Since the returned value is arbitrary, INNER JOIN and SEMI JOIN are contradictory. In fact, i executed the same case on MariaDB and PostgreSQL and no bug was generated. I think this is a valuable reference. All in all, the issue appears to be a logic inconsistency in handling `!=` and `DISTINCT`, not just case sensitivity. This can cause real trouble for MySQL users. Could you please further investigate it? Best regards, Jinhui Lai
[11 Mar 22:03]
Roy Lyseng
In the case SELECT DISTINCT t0.c0 FROM t0 INNER JOIN t1 ON t0.c0 != t1.c0; the inequality is satisfied for the 'b' value in t1.c0, thus it is reasonable to return a value. It is also arbitrary whether 'a' or 'A' is returned. As for the tests without the 'b' value, I am unable to reproduce your results on 9.x.