Description:
We have identified a logic bug in the MySQL optimizer regarding the rewriting of ALL subqueries.
When comparing a numeric constant (derived from an EXISTS clause) with an ALL subquery on a String column (MEDIUMTEXT), the optimizer produces inconsistent results depending on the source of the constant:
Scenario A (Bug): When the EXISTS subquery references the same table as the ALL subquery (t64 vs t64), the optimizer aggressively rewrites the ALL predicate into a MAX() aggregate. This is unsafe because MAX() uses Lexicographical Order (String context), whereas the comparison requires Numeric Order (Implicit casting context).
Result: TRUE (1) [Incorrect]
Scenario B (Correct): When the EXISTS subquery references a different table (t63 vs t64), the optimizer treats the subquery as dependent/complex and falls back to a row-by-row table scan. This correctly respects the implicit numeric casting.
Result: FALSE (0) [Correct]
How to repeat:
CREATE TABLE t63 (c1 DOUBLE, c2 BIT(14));
CREATE TABLE t64 (c1 MEDIUMTEXT CHARACTER SET utf8 NOT NULL);
INSERT INTO t63 VALUES (1.0, b'111');
INSERT INTO t64 (c1) VALUES ('4r4d');
INSERT INTO t64 (c1) VALUES ('poe_test');
-- Query 1 (The Bug): Homogeneous Reference (t64 vs t64)
-- The optimizer sees EXISTS(t64) as a constant and rewrites ALL to MAX().
-- MAX('4r4d', 'poe_test') = 'poe_test' -> Casts to 0.
-- 1 >= 0 is TRUE.
SELECT 'Query 1 (Bug)', ((EXISTS (SELECT c1 FROM t64)) >= ALL (SELECT c1 FROM t64)) AS result;
-- ACTUAL RESULT: 1
-- EXPECTED RESULT: 0
-- Query 2 (Correct): Heterogeneous Reference (t63 vs t64)
-- The optimizer treats EXISTS(t63) as dependent/complex.
-- It performs a table scan. '4r4d' casts to 4.
-- 1 >= 4 is FALSE.
SELECT 'Query 2 (Correct)', ((EXISTS (SELECT c1 FROM t63)) >= ALL (SELECT c1 FROM t64)) AS result;
-- ACTUAL RESULT: 0
-- EXPECTED RESULT: 0
Description: We have identified a logic bug in the MySQL optimizer regarding the rewriting of ALL subqueries. When comparing a numeric constant (derived from an EXISTS clause) with an ALL subquery on a String column (MEDIUMTEXT), the optimizer produces inconsistent results depending on the source of the constant: Scenario A (Bug): When the EXISTS subquery references the same table as the ALL subquery (t64 vs t64), the optimizer aggressively rewrites the ALL predicate into a MAX() aggregate. This is unsafe because MAX() uses Lexicographical Order (String context), whereas the comparison requires Numeric Order (Implicit casting context). Result: TRUE (1) [Incorrect] Scenario B (Correct): When the EXISTS subquery references a different table (t63 vs t64), the optimizer treats the subquery as dependent/complex and falls back to a row-by-row table scan. This correctly respects the implicit numeric casting. Result: FALSE (0) [Correct] How to repeat: CREATE TABLE t63 (c1 DOUBLE, c2 BIT(14)); CREATE TABLE t64 (c1 MEDIUMTEXT CHARACTER SET utf8 NOT NULL); INSERT INTO t63 VALUES (1.0, b'111'); INSERT INTO t64 (c1) VALUES ('4r4d'); INSERT INTO t64 (c1) VALUES ('poe_test'); -- Query 1 (The Bug): Homogeneous Reference (t64 vs t64) -- The optimizer sees EXISTS(t64) as a constant and rewrites ALL to MAX(). -- MAX('4r4d', 'poe_test') = 'poe_test' -> Casts to 0. -- 1 >= 0 is TRUE. SELECT 'Query 1 (Bug)', ((EXISTS (SELECT c1 FROM t64)) >= ALL (SELECT c1 FROM t64)) AS result; -- ACTUAL RESULT: 1 -- EXPECTED RESULT: 0 -- Query 2 (Correct): Heterogeneous Reference (t63 vs t64) -- The optimizer treats EXISTS(t63) as dependent/complex. -- It performs a table scan. '4r4d' casts to 4. -- 1 >= 4 is FALSE. SELECT 'Query 2 (Correct)', ((EXISTS (SELECT c1 FROM t63)) >= ALL (SELECT c1 FROM t64)) AS result; -- ACTUAL RESULT: 0 -- EXPECTED RESULT: 0