Bug #120803 CTE Materialization Causes LEAST to Degenerate from String Comparison to Numeric Comparison
Submitted: 28 Jun 13:20 Modified: 29 Jun 2:51
Reporter: Annie liu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:9.6 OS:Any
Assigned to: CPU Architecture:Any

[28 Jun 13:20] Annie liu
Description:
When a query is rewritten using ROW_NUMBER() window functions to align tables into a derived table via CTEs, the LEAST function that correctly performed string collation‑based comparisons on the base table degenerates to numeric comparison in the rewritten query. As a result, strings that do not start with digits are converted to 0, causing all rows to return 0, which is completely incorrect.

In the original query, LEAST('z', '_ejEZpA(1', '2090248689', c0) has all four arguments as strings, so it uses string comparison (lexicographic order). The results are '0', '1521367719', and '2090248689' for the three rows.

In the rewritten query, c0 comes from a materialized CTE temporary table, where the column’s metadata (character set and collation) may have been lost or simplified. As a result, LEAST no longer recognizes all arguments as strings and falls back to numeric comparison, converting non‑numeric strings to 0, making all results 0.

Actual Results and Status

Query	Actual Result	Status
Original (FROM s)	0, 1521367719, 2090248689	Correct
CTE rewrite	0, 0, 0	Incorrect

EXPLAIN Analysis (Rewritten Query – Key Part)

Nested loop inner join
  -> Sort: vp_rowid
    -> Materialize CTE lq
      -> Window aggregate: row_number() OVER (ORDER BY l.vp_rowid)
  -> Index lookup on rq using <auto_key0> (rn = lq.rn)
    -> Materialize CTE rq
      -> Window aggregate: row_number() OVER (ORDER BY r.vp_rowid)

r.c0, a TEXT column, goes through CTE rq materialization to a temporary table, and then participates in the JOIN as d.c0.

The materialization likely simplifies or drops column metadata (character set and collation), causing the outer LEAST to lose the string context.

Root Cause Analysis

LEAST comparison mode selection: In MySQL, LEAST uses string collation‑based comparison if all arguments are string types with a character set/collation; otherwise, it converts all arguments to DOUBLE and performs numeric comparison.

Original query: s.c0 has explicit TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, so LEAST uses string comparison.

Type degradation after CTE materialization: When a CTE is materialized to an internal temporary table, column metadata (character set, collation) may be stripped or reduced to a generic BLOB/TEXT representation without collation information.

Mis‑evaluation of LEAST: Since d.c0 is no longer considered a pure string, LEAST treats the arguments as mixed‑type and defaults to numeric comparison. All strings are converted to DOUBLE: 'z', '_ejEZpA(1', and 'f~O_1' become 0 (non‑numeric start), '2090248689' becomes 2090248689, and '0' becomes 0. The numeric minimum is 0, so all rows return 0.

How to repeat:
DROP DATABASE IF EXISTS codex_db1_min;
CREATE DATABASE codex_db1_min;
USE codex_db1_min;

CREATE TABLE s(
  vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  c0 TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL
) ENGINE=InnoDB;

INSERT INTO s(c0) VALUES
  ('0'),
  ('1521367719'),
  ('f~O_1');

CREATE TABLE l(
  vp_rowid BIGINT NOT NULL PRIMARY KEY
) ENGINE=InnoDB;

CREATE TABLE r(
  vp_rowid BIGINT NOT NULL PRIMARY KEY,
  c0 TEXT NULL
) ENGINE=InnoDB;

INSERT INTO l(vp_rowid)
SELECT vp_rowid FROM s;

INSERT INTO r(vp_rowid, c0)
SELECT vp_rowid, c0 FROM s;

-- Original query (correct): returns string‑comparison results
SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', c0) AS SIGNED) AS ref0
FROM s
ORDER BY vp_rowid;

-- Rewritten query (incorrect): returns all 0 (numeric comparison)
SELECT CAST(LEAST('z', '_ejEZpA(1', '2090248689', d.c0) AS SIGNED) AS ref0
FROM (
  WITH lq AS (
    SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid
    FROM l
  ),
  rq AS (
    SELECT ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn, vp_rowid, c0
    FROM r
  )
  SELECT lq.vp_rowid AS vp_rowid, rq.c0 AS c0
  FROM lq
  JOIN rq ON lq.rn = rq.rn
) d
ORDER BY d.vp_rowid;
[28 Jun 21:03] Roy Lyseng
Thank you for the bug report.
Verified as described.
[28 Jun 21:26] Roy Lyseng
I don't think your analysis is correct, though.

LEAST with at least one string argument produces a string result.
It seems that the two different queries uses different collations,
which may be the reason for the different results.
[29 Jun 2:51] Annie liu
Thank you for verifying the bug and for the correction on the root cause.It's good to know the issue is confirmed and will be addressed.