Description:
When a query is rewritten using ROW_NUMBER() window functions to align tables into a derived table via CTEs, the LEAST function that correctly performed string collation‑based comparisons on the base table degenerates to numeric comparison in the rewritten query. As a result, strings that do not start with digits are converted to 0, causing all rows to return 0, which is completely incorrect.
In the original query, LEAST('z', '_ejEZpA(1', '2090248689', c0) has all four arguments as strings, so it uses string comparison (lexicographic order). The results are '0', '1521367719', and '2090248689' for the three rows.
In the rewritten query, c0 comes from a materialized CTE temporary table, where the column’s metadata (character set and collation) may have been lost or simplified. As a result, LEAST no longer recognizes all arguments as strings and falls back to numeric comparison, converting non‑numeric strings to 0, making all results 0.
Actual Results and Status
Query Actual Result Status
Original (FROM s) 0, 1521367719, 2090248689 Correct
CTE rewrite 0, 0, 0 Incorrect
EXPLAIN Analysis (Rewritten Query – Key Part)
Nested loop inner join
-> Sort: vp_rowid
-> Materialize CTE lq
-> Window aggregate: row_number() OVER (ORDER BY l.vp_rowid)
-> Index lookup on rq using <auto_key0> (rn = lq.rn)
-> Materialize CTE rq
-> Window aggregate: row_number() OVER (ORDER BY r.vp_rowid)
r.c0, a TEXT column, goes through CTE rq materialization to a temporary table, and then participates in the JOIN as d.c0.
The materialization likely simplifies or drops column metadata (character set and collation), causing the outer LEAST to lose the string context.
Root Cause Analysis
LEAST comparison mode selection: In MySQL, LEAST uses string collation‑based comparison if all arguments are string types with a character set/collation; otherwise, it converts all arguments to DOUBLE and performs numeric comparison.
Original query: s.c0 has explicit TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, so LEAST uses string comparison.
Type degradation after CTE materialization: When a CTE is materialized to an internal temporary table, column metadata (character set, collation) may be stripped or reduced to a generic BLOB/TEXT representation without collation information.
Mis‑evaluation of LEAST: Since d.c0 is no longer considered a pure string, LEAST treats the arguments as mixed‑type and defaults to numeric comparison. All strings are converted to DOUBLE: 'z', '_ejEZpA(1', and 'f~O_1' become 0 (non‑numeric start), '2090248689' becomes 2090248689, and '0' becomes 0. The numeric minimum is 0, so all rows return 0.
How to repeat:
DROP DATABASE IF EXISTS codex_db1_min;
CREATE DATABASE codex_db1_min;
USE codex_db1_min;
CREATE TABLE s(
vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
c0 TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL
) ENGINE=InnoDB;
INSERT INTO s(c0) VALUES
('0'),
('1521367719'),
('f~O_1');
CREATE TABLE l(
vp_rowid BIGINT NOT NULL PRIMARY KEY
) ENGINE=InnoDB;
CREATE TABLE r(
vp_rowid BIGINT NOT NULL PRIMARY KEY,
c0 TEXT NULL
) ENGINE=InnoDB;
INSERT INTO l(vp_rowid)
SELECT vp_rowid FROM s;
INSERT INTO r(vp_rowid, c0)
SELECT vp_rowid, c0 FROM s;
-- Original query (correct): returns string‑comparison results
SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', c0) AS SIGNED) AS ref0
FROM s
ORDER BY vp_rowid;
-- Rewritten query (incorrect): returns all 0 (numeric comparison)
SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', d.c0) AS SIGNED) AS ref0
FROM (
WITH lq AS (
SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid
FROM l
),
rq AS (
SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid, c0
FROM r
)
SELECT lq.vp_rowid AS vp_rowid, rq.c0 AS c0
FROM lq
JOIN rq ON lq.rn = rq.rn
) d
ORDER BY d.vp_rowid;
Description: When a query is rewritten using ROW_NUMBER() window functions to align tables into a derived table via CTEs, the LEAST function that correctly performed string collation‑based comparisons on the base table degenerates to numeric comparison in the rewritten query. As a result, strings that do not start with digits are converted to 0, causing all rows to return 0, which is completely incorrect. In the original query, LEAST('z', '_ejEZpA(1', '2090248689', c0) has all four arguments as strings, so it uses string comparison (lexicographic order). The results are '0', '1521367719', and '2090248689' for the three rows. In the rewritten query, c0 comes from a materialized CTE temporary table, where the column’s metadata (character set and collation) may have been lost or simplified. As a result, LEAST no longer recognizes all arguments as strings and falls back to numeric comparison, converting non‑numeric strings to 0, making all results 0. Actual Results and Status Query Actual Result Status Original (FROM s) 0, 1521367719, 2090248689 Correct CTE rewrite 0, 0, 0 Incorrect EXPLAIN Analysis (Rewritten Query – Key Part) Nested loop inner join -> Sort: vp_rowid -> Materialize CTE lq -> Window aggregate: row_number() OVER (ORDER BY l.vp_rowid) -> Index lookup on rq using <auto_key0> (rn = lq.rn) -> Materialize CTE rq -> Window aggregate: row_number() OVER (ORDER BY r.vp_rowid) r.c0, a TEXT column, goes through CTE rq materialization to a temporary table, and then participates in the JOIN as d.c0. The materialization likely simplifies or drops column metadata (character set and collation), causing the outer LEAST to lose the string context. Root Cause Analysis LEAST comparison mode selection: In MySQL, LEAST uses string collation‑based comparison if all arguments are string types with a character set/collation; otherwise, it converts all arguments to DOUBLE and performs numeric comparison. Original query: s.c0 has explicit TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, so LEAST uses string comparison. Type degradation after CTE materialization: When a CTE is materialized to an internal temporary table, column metadata (character set, collation) may be stripped or reduced to a generic BLOB/TEXT representation without collation information. Mis‑evaluation of LEAST: Since d.c0 is no longer considered a pure string, LEAST treats the arguments as mixed‑type and defaults to numeric comparison. All strings are converted to DOUBLE: 'z', '_ejEZpA(1', and 'f~O_1' become 0 (non‑numeric start), '2090248689' becomes 2090248689, and '0' becomes 0. The numeric minimum is 0, so all rows return 0. How to repeat: DROP DATABASE IF EXISTS codex_db1_min; CREATE DATABASE codex_db1_min; USE codex_db1_min; CREATE TABLE s( vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, c0 TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL ) ENGINE=InnoDB; INSERT INTO s(c0) VALUES ('0'), ('1521367719'), ('f~O_1'); CREATE TABLE l( vp_rowid BIGINT NOT NULL PRIMARY KEY ) ENGINE=InnoDB; CREATE TABLE r( vp_rowid BIGINT NOT NULL PRIMARY KEY, c0 TEXT NULL ) ENGINE=InnoDB; INSERT INTO l(vp_rowid) SELECT vp_rowid FROM s; INSERT INTO r(vp_rowid, c0) SELECT vp_rowid, c0 FROM s; -- Original query (correct): returns string‑comparison results SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', c0) AS SIGNED) AS ref0 FROM s ORDER BY vp_rowid; -- Rewritten query (incorrect): returns all 0 (numeric comparison) SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', d.c0) AS SIGNED) AS ref0 FROM ( WITH lq AS ( SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid FROM l ), rq AS ( SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid, c0 FROM r ) SELECT lq.vp_rowid AS vp_rowid, rq.c0 AS c0 FROM lq JOIN rq ON lq.rn = rq.rn ) d ORDER BY d.vp_rowid;