Description:
We have identified a severe inconsistency in query results between a NATURAL JOIN and its logically equivalent representation using EXCEPT ALL when implicit type casting occurs during the join. This suggests a potential bug in the query execution engine, particularly in how set operators (EXCEPT ALL) handle derived values from joins.
Steps to Reproduce:
1. Create tables with different data types for the join key:
-- Table t0 with a VARCHAR column
CREATE TABLE t0 (c0 VARCHAR(500) NOT NULL);
INSERT INTO t0 VALUES ('0.6952750358145634'), ('xgC1');
-- Table t1 with a DOUBLE column
CREATE TABLE t1 (c0 DOUBLE NOT NULL PRIMARY KEY);
INSERT INTO t1 VALUES (0);
2. Execute Query 1 (Standard NATURAL JOIN):
SELECT DISTINCTROW t1.c0 AS ref0, t0.c0 AS ref1
FROM t0 NATURAL JOIN t1
GROUP BY t1.c0, t0.c0;
3. Execute Query 2 (Logically equivalent form using EXCEPT ALL):
SELECT DISTINCTROW c0 AS ref0, c0 AS ref1
FROM (
SELECT * FROM (
SELECT * FROM t0 NATURAL LEFT JOIN t1
) AS LEFT_ALL
EXCEPT ALL
SELECT * FROM (
(SELECT * FROM t0 NATURAL LEFT JOIN t1) AS LEFT_ALL_INNER
EXCEPT ALL
(SELECT * FROM t0 NATURAL RIGHT JOIN t1) AS RIGHT_ALL_INNER
) AS LEFT_ONLY
) AS T_INNER
GROUP BY c0, c0;
Query 1 Output:
+------+------+
| ref0 | ref1 |
+------+------+
| 0 | xgC1 |
+------+------+
1 row in set
Query 2 Output:
Empty set
The two queries should be logically equivalent. The structure A - (A - B) is a standard relational algebra method to compute the intersection A ∩ B. While we acknowledge MySQL's special behavior in SELECT after a JOIN (as seen in Query 1), the EXCEPT ALL structure (Query 2) should also correctly simulate the INNER JOIN and produce a consistent result set.
Based on a step-by-step evaluation, Query 2 is expected to produce one row: (0.0, 0.0). The fact that it returns an empty set indicates a likely flaw in the execution of EXCEPT ALL when operating on derived, mixed-type data from preceding joins.
Regardless of the specific output, the two queries, which aim to compute the same logical join, should not produce such dramatically different results (one row vs. empty set).
How to repeat:
mysql> status;
--------------
mysql Ver 9.2.0 for Linux on x86_64 (MySQL Community Server - GPL)
Connection id: 1191
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 9.2.0 MySQL Community Server - GPL
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: latin1
Conn. characterset: latin1
UNIX socket: /var/run/mysqld/mysqld.sock
Binary data as: Hexadecimal
Uptime: 6 hours 53 min 21 sec
SQLcode:
------------------------------------------------------------------------------
DROP DATABASE IF EXISTS database98;
CREATE DATABASE database98;
USE database98;
CREATE TABLE IF NOT EXISTS t0(c0 VARCHAR(500) STORAGE DISK NOT NULL) ;
CREATE TABLE t1(c0 DOUBLE ZEROFILL COLUMN_FORMAT DEFAULT COMMENT 'asdf' UNIQUE KEY PRIMARY KEY STORAGE DISK) PACK_KEYS = DEFAULT, ENGINE = HEAP;
DELETE LOW_PRIORITY IGNORE FROM t1;
SET SESSION read_rnd_buffer_size = 1201856744;
SET SESSION preload_buffer_size = 604501183;
CHECKSUM TABLE t0;
INSERT IGNORE INTO t1(c0) VALUES(']M鯢b');
CREATE UNIQUE INDEX i0 ON t1(c0) VISIBLE;
REPLACE DELAYED INTO t0(c0) VALUES(0.6952750358145634);
INSERT DELAYED IGNORE INTO t0(c0) VALUES("xgC1");
SET SESSION join_buffer_size = 4734638815441487969;
select * from t0;
select * from t1;
SELECT DISTINCTROW t1.c0 AS ref0, t0.c0 AS ref1 FROM t0 NATURAL JOIN t1 GROUP BY t1.c0, t0.c0;
SELECT DISTINCTROW c0 AS ref0, c0 AS ref1 FROM ( SELECT * FROM (SELECT * FROM t0 NATURAL LEFT JOIN t1 ) AS LEFT_ALL EXCEPT ALL SELECT * FROM ( SELECT * FROM (SELECT * FROM t0 NATURAL LEFT JOIN t1 ) AS LEFT_ALL_INNER EXCEPT ALL SELECT * FROM (SELECT * FROM t0 NATURAL RIGHT JOIN t1 ) AS RIGHT_ALL_INNER ) AS LEFT_ONLY ) AS T_INNER GROUP BY c0, c0;