MySQL Bugs: #120253: Index Lookup Skips CAST for LONGTEXT-to-DATE Coercion in JOIN,Causing Plan-Dependent Result Divergence

Bug #120253	Index Lookup Skips CAST for LONGTEXT-to-DATE Coercion in JOIN,Causing Plan-Dependent Result Divergence
Submitted:	13 Apr 9:14	Modified:	13 Apr 10:18
Reporter:	Y F	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	8.0.45	OS:	Ubuntu (Ubuntu 20.04.4 LTS)
Assigned to:		CPU Architecture:	x86 (Intel(R) Xeon(R) Platinum 8358P @ 2.60GHz (64 Cores, 96MB L3 Cache))

Description:
This issue can be described using two temporal states of the same tables.

In the baseline state S1, after creating the tables and inserting a small
amount of data, the query returns 300 rows:

------------------------------------
mysql> SELECT 'S1' AS stage, COUNT(*) AS cnt
    -> FROM t_main m
    -> JOIN t_ref r ON m.d = r.lt;
+-------+-----+
| stage | cnt |
+-------+-----+
| S1    | 300 |
+-------+-----+
1 row in set (0.001 sec)
------------------------------------

Then I transform S1 into S2 by inserting 2000 additional rows into t_main,
followed by ANALYZE TABLE, and execute the same query again. In S2, the
query returns 0 rows.

------------------------------------
mysql> SELECT 'S2' AS stage, COUNT(*) AS cnt
    -> FROM t_main m
    -> JOIN t_ref r ON m.d = r.lt;
+-------+-----+
| stage | cnt |
+-------+-----+
| S2    | 0   |
+-------+-----+
1 row in set (0.001 sec)
------------------------------------

Because S2 is obtained from S1 by append-only inserts, the JOIN result set
in S2 should be a superset of the corresponding result set in S1. Therefore
the row count should satisfy the monotonicity invariant COUNT(S2) >= COUNT(S1).
Instead, the observed results are:

------------------------------------
COUNT(S1) = 300
COUNT(S2) = 0
------------------------------------

This violates the expected S1-to-S2 monotonicity relationship.

How to repeat:
DROP TABLE IF EXISTS t_main;
DROP TABLE IF EXISTS t_ref;

CREATE TABLE t_main (
    id INT PRIMARY KEY,
    d  DATE NOT NULL
);
CREATE INDEX idx_d ON t_main (d);

CREATE TABLE t_ref (
    id INT PRIMARY KEY,
    lt LONGTEXT
);

-- S1: baseline state
-- Invalid DATE strings are stored as '0000-00-00' via INSERT IGNORE
INSERT IGNORE INTO t_main VALUES (1, 'not-a-date'),
                                  (2, 'not-a-date'),
                                  (3, 'not-a-date');

-- 100 non-date LONGTEXT rows; hash join casts them to '0000-00-00', matching all t_main rows
INSERT INTO t_ref
WITH RECURSIVE seq(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM seq WHERE n < 100
)
SELECT n, CONCAT('sample_', n) FROM seq;

ANALYZE TABLE t_main;
ANALYZE TABLE t_ref;

-- Query result on S1: optimizer chooses hash join, returns 300 rows (correct)
SELECT 'S1' AS stage, COUNT(*) AS cnt
FROM t_main m
JOIN t_ref  r ON m.d = r.lt;

-- S2: append-only expansion of S1
SET SESSION cte_max_recursion_depth = 10000;

-- Insert 2000 valid-date rows to shift optimizer statistics
INSERT INTO t_main
WITH RECURSIVE seq(n) AS (
    SELECT 4
    UNION ALL
    SELECT n + 1 FROM seq WHERE n < 2004
)
SELECT n, DATE_ADD('2000-01-01', INTERVAL n DAY) FROM seq;

-- Refresh statistics to trigger the plan switch
ANALYZE TABLE t_main;

-- Query result on S2: optimizer switches to nested loop index lookup, returns 0 rows (wrong)
SELECT 'S2' AS stage, COUNT(*) AS cnt
FROM t_main m
JOIN t_ref  r ON m.d = r.lt;

-- Optional diagnostics
EXPLAIN SELECT * FROM t_main m
        JOIN t_ref r ON m.d = r.lt;

EXPLAIN SELECT * FROM t_main m IGNORE INDEX (idx_d)
        JOIN t_ref r ON m.d = r.lt;

-- Forcing the old plan recovers 300 rows, confirming the data itself is correct
SELECT 'S2_force_no_index' AS stage, COUNT(*) AS cnt
FROM t_main m IGNORE INDEX (idx_d)
JOIN t_ref  r ON m.d = r.lt;

Hi Y F,

Thank you for the test case. Verified as described.