Description:
This issue can be described using two temporal states of the same tables.
In the baseline state S1, after creating the tables and inserting a small
amount of data, the query returns 300 rows:
------------------------------------
mysql> SELECT 'S1' AS stage, COUNT(*) AS cnt
-> FROM t_main m
-> JOIN t_ref r ON m.d = r.lt;
+-------+-----+
| stage | cnt |
+-------+-----+
| S1 | 300 |
+-------+-----+
1 row in set (0.001 sec)
------------------------------------
Then I transform S1 into S2 by inserting 2000 additional rows into t_main,
followed by ANALYZE TABLE, and execute the same query again. In S2, the
query returns 0 rows.
------------------------------------
mysql> SELECT 'S2' AS stage, COUNT(*) AS cnt
-> FROM t_main m
-> JOIN t_ref r ON m.d = r.lt;
+-------+-----+
| stage | cnt |
+-------+-----+
| S2 | 0 |
+-------+-----+
1 row in set (0.001 sec)
------------------------------------
Because S2 is obtained from S1 by append-only inserts, the JOIN result set
in S2 should be a superset of the corresponding result set in S1. Therefore
the row count should satisfy the monotonicity invariant COUNT(S2) >= COUNT(S1).
Instead, the observed results are:
------------------------------------
COUNT(S1) = 300
COUNT(S2) = 0
------------------------------------
This violates the expected S1-to-S2 monotonicity relationship.
How to repeat:
DROP TABLE IF EXISTS t_main;
DROP TABLE IF EXISTS t_ref;
CREATE TABLE t_main (
id INT PRIMARY KEY,
d DATE NOT NULL
);
CREATE INDEX idx_d ON t_main (d);
CREATE TABLE t_ref (
id INT PRIMARY KEY,
lt LONGTEXT
);
-- S1: baseline state
-- Invalid DATE strings are stored as '0000-00-00' via INSERT IGNORE
INSERT IGNORE INTO t_main VALUES (1, 'not-a-date'),
(2, 'not-a-date'),
(3, 'not-a-date');
-- 100 non-date LONGTEXT rows; hash join casts them to '0000-00-00', matching all t_main rows
INSERT INTO t_ref
WITH RECURSIVE seq(n) AS (
SELECT 1
UNION ALL
SELECT n + 1 FROM seq WHERE n < 100
)
SELECT n, CONCAT('sample_', n) FROM seq;
ANALYZE TABLE t_main;
ANALYZE TABLE t_ref;
-- Query result on S1: optimizer chooses hash join, returns 300 rows (correct)
SELECT 'S1' AS stage, COUNT(*) AS cnt
FROM t_main m
JOIN t_ref r ON m.d = r.lt;
-- S2: append-only expansion of S1
SET SESSION cte_max_recursion_depth = 10000;
-- Insert 2000 valid-date rows to shift optimizer statistics
INSERT INTO t_main
WITH RECURSIVE seq(n) AS (
SELECT 4
UNION ALL
SELECT n + 1 FROM seq WHERE n < 2004
)
SELECT n, DATE_ADD('2000-01-01', INTERVAL n DAY) FROM seq;
-- Refresh statistics to trigger the plan switch
ANALYZE TABLE t_main;
-- Query result on S2: optimizer switches to nested loop index lookup, returns 0 rows (wrong)
SELECT 'S2' AS stage, COUNT(*) AS cnt
FROM t_main m
JOIN t_ref r ON m.d = r.lt;
-- Optional diagnostics
EXPLAIN SELECT * FROM t_main m
JOIN t_ref r ON m.d = r.lt;
EXPLAIN SELECT * FROM t_main m IGNORE INDEX (idx_d)
JOIN t_ref r ON m.d = r.lt;
-- Forcing the old plan recovers 300 rows, confirming the data itself is correct
SELECT 'S2_force_no_index' AS stage, COUNT(*) AS cnt
FROM t_main m IGNORE INDEX (idx_d)
JOIN t_ref r ON m.d = r.lt;
Description: This issue can be described using two temporal states of the same tables. In the baseline state S1, after creating the tables and inserting a small amount of data, the query returns 300 rows: ------------------------------------ mysql> SELECT 'S1' AS stage, COUNT(*) AS cnt -> FROM t_main m -> JOIN t_ref r ON m.d = r.lt; +-------+-----+ | stage | cnt | +-------+-----+ | S1 | 300 | +-------+-----+ 1 row in set (0.001 sec) ------------------------------------ Then I transform S1 into S2 by inserting 2000 additional rows into t_main, followed by ANALYZE TABLE, and execute the same query again. In S2, the query returns 0 rows. ------------------------------------ mysql> SELECT 'S2' AS stage, COUNT(*) AS cnt -> FROM t_main m -> JOIN t_ref r ON m.d = r.lt; +-------+-----+ | stage | cnt | +-------+-----+ | S2 | 0 | +-------+-----+ 1 row in set (0.001 sec) ------------------------------------ Because S2 is obtained from S1 by append-only inserts, the JOIN result set in S2 should be a superset of the corresponding result set in S1. Therefore the row count should satisfy the monotonicity invariant COUNT(S2) >= COUNT(S1). Instead, the observed results are: ------------------------------------ COUNT(S1) = 300 COUNT(S2) = 0 ------------------------------------ This violates the expected S1-to-S2 monotonicity relationship. How to repeat: DROP TABLE IF EXISTS t_main; DROP TABLE IF EXISTS t_ref; CREATE TABLE t_main ( id INT PRIMARY KEY, d DATE NOT NULL ); CREATE INDEX idx_d ON t_main (d); CREATE TABLE t_ref ( id INT PRIMARY KEY, lt LONGTEXT ); -- S1: baseline state -- Invalid DATE strings are stored as '0000-00-00' via INSERT IGNORE INSERT IGNORE INTO t_main VALUES (1, 'not-a-date'), (2, 'not-a-date'), (3, 'not-a-date'); -- 100 non-date LONGTEXT rows; hash join casts them to '0000-00-00', matching all t_main rows INSERT INTO t_ref WITH RECURSIVE seq(n) AS ( SELECT 1 UNION ALL SELECT n + 1 FROM seq WHERE n < 100 ) SELECT n, CONCAT('sample_', n) FROM seq; ANALYZE TABLE t_main; ANALYZE TABLE t_ref; -- Query result on S1: optimizer chooses hash join, returns 300 rows (correct) SELECT 'S1' AS stage, COUNT(*) AS cnt FROM t_main m JOIN t_ref r ON m.d = r.lt; -- S2: append-only expansion of S1 SET SESSION cte_max_recursion_depth = 10000; -- Insert 2000 valid-date rows to shift optimizer statistics INSERT INTO t_main WITH RECURSIVE seq(n) AS ( SELECT 4 UNION ALL SELECT n + 1 FROM seq WHERE n < 2004 ) SELECT n, DATE_ADD('2000-01-01', INTERVAL n DAY) FROM seq; -- Refresh statistics to trigger the plan switch ANALYZE TABLE t_main; -- Query result on S2: optimizer switches to nested loop index lookup, returns 0 rows (wrong) SELECT 'S2' AS stage, COUNT(*) AS cnt FROM t_main m JOIN t_ref r ON m.d = r.lt; -- Optional diagnostics EXPLAIN SELECT * FROM t_main m JOIN t_ref r ON m.d = r.lt; EXPLAIN SELECT * FROM t_main m IGNORE INDEX (idx_d) JOIN t_ref r ON m.d = r.lt; -- Forcing the old plan recovers 300 rows, confirming the data itself is correct SELECT 'S2_force_no_index' AS stage, COUNT(*) AS cnt FROM t_main m IGNORE INDEX (idx_d) JOIN t_ref r ON m.d = r.lt;