Description:
I have identified a logical inconsistency where the exact same data and query produce different results (Runtime Error vs Empty Set) depending on the table's DDL history.
The issue involves PERIOD_DIFF receiving invalid arguments (e.g., -9).
Table (t0): Created via CREATE then ALTER ADD PRIMARY KEY. The optimizer chooses a plan that evaluates PERIOD_DIFF eagerly, causing a crash/error before realizing the result set is empty.
Table (t0_shadow): Created via a single CREATE statement (simulating SHOW CREATE TABLE output). The optimizer chooses a plan with Covering index skip scan or detects the empty join earlier (Lazy evaluation), short-circuiting the invalid function call and successfully returning an empty set.
This violates the principle of query determinism: the database should consistently either fail or succeed regardless of trivial metadata differences.
How to repeat:
-- 1. Create Original Table (t0)
-- Simulated workflow: Create table first, then add Primary Key later.
CREATE TABLE t0 (c1 DOUBLE);
ALTER TABLE t0 ADD PRIMARY KEY (c1);
-- 2. Insert Data
-- Note: '-9' is an invalid argument for PERIOD_DIFF, which triggers the error.
INSERT INTO t0 (c1) VALUES (12), (1), (-9), (10);
-- 3. Create Shadow Table (t0_shadow)
-- Simulated workflow: Created directly using the definition from 'SHOW CREATE TABLE t0'.
-- This results in slightly different table statistics/metadata compared to t0.
CREATE TABLE t0_shadow (
c1 double NOT NULL,
PRIMARY KEY (c1)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO t0_shadow SELECT * FROM t0;
-- 4. Execute Query on t0 (Expect Error)
SELECT 'Testing t0 (Actual: Error)' AS status;
SELECT ta1.ca5
FROM (SELECT c1 AS ca5 FROM t0) AS ta1
JOIN (SELECT c1 FROM t0 WHERE (c1 & NULL) GROUP BY c1) AS ta2
ON (ta1.ca5 SOUNDS LIKE PERIOD_DIFF(ta1.ca5, 11));
-- 5. Execute Query on t0_shadow (Expect Empty Set)
SELECT 'Testing t0_shadow (Actual: Empty Set)' AS status;
SELECT ta1.ca5
FROM (SELECT c1 AS ca5 FROM t0_shadow) AS ta1
JOIN (SELECT c1 FROM t0_shadow WHERE (c1 & NULL) GROUP BY c1) AS ta2
ON (ta1.ca5 SOUNDS LIKE PERIOD_DIFF(ta1.ca5, 11));
Suggested fix:
The optimizer should handle potentially failing functions (like PERIOD_DIFF with invalid data) deterministically. If a branch of the execution plan can be proven to be empty (e.g., the right side of the JOIN is empty), the evaluation of the ON clause functions should be consistently skipped (short-circuited) to avoid runtime errors on data that will be discarded anyway.