Description:
This issue can be described using two execution states over the same tables
and the same data.
In the baseline execution state S1, after creating the tables, inserting a
small amount of data, and forcing the FirstMatch semijoin path, the query
returns 4 rows:
------------------------------------
mysql> SELECT 'S1_count' AS stage, COUNT(*) AS cnt
-> FROM t_main m
-> WHERE m.c5 IS NULL
-> AND m.c4 = 8
-> AND LENGTH(COALESCE(m.c3, '')) >= 7
-> AND EXISTS (
-> SELECT 1
-> FROM t_ref r
-> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
-> AND r.c5 IS NOT NULL
-> AND m.c2 = r.c4
-> );
+----------+-----+
| stage | cnt |
+----------+-----+
| S1_count | 4 |
+----------+-----+
1 row in set
------------------------------------
The 4 returned rows are:
------------------------------------
mysql> SELECT 'S1_rows' AS stage, m.c1, m.c2, m.c4
-> FROM t_main m
-> WHERE m.c5 IS NULL
-> AND m.c4 = 8
-> AND LENGTH(COALESCE(m.c3, '')) >= 7
-> AND EXISTS (
-> SELECT 1
-> FROM t_ref r
-> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
-> AND r.c5 IS NOT NULL
-> AND m.c2 = r.c4
-> )
-> ORDER BY m.c1;
+---------+----+-------------------+----+
| stage | c1 | c2 | c4 |
+---------+----+-------------------+----+
| S1_rows | 1 | not-a-date | 8 |
| S1_rows | 2 | hv_820 | 8 |
| S1_rows | 3 | k9zs-031qxa86z954 | 8 |
| S1_rows | 4 | dldk4w-7o6 | 8 |
+---------+----+-------------------+----+
4 rows in set
------------------------------------
Then I keep the table contents unchanged, but switch the optimizer to disable
FirstMatch and force the materialized semijoin path. In this execution state
S2, the same query returns 0 rows:
------------------------------------
mysql> SELECT 'S2_count' AS stage, COUNT(*) AS cnt
-> FROM t_main m
-> WHERE m.c5 IS NULL
-> AND m.c4 = 8
-> AND LENGTH(COALESCE(m.c3, '')) >= 7
-> AND EXISTS (
-> SELECT 1
-> FROM t_ref r
-> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
-> AND r.c5 IS NOT NULL
-> AND m.c2 = r.c4
-> );
+----------+-----+
| stage | cnt |
+----------+-----+
| S2_count | 0 |
+----------+-----+
1 row in set
------------------------------------
Because S1 and S2 use the same tables and the same data, they are logically
equivalent execution states. Therefore the result sets must be identical.
Instead, the observed results are:
------------------------------------
COUNT(S1) = 4
COUNT(S2) = 0
------------------------------------
This violates the expected S1-to-S2 consistency relationship.
The root cause is an optimizer bug in implicit coercion for correlated
`EXISTS`. The outer operand is `VARCHAR`, while the correlated subquery
compares it to a `YEAR` column. Under the FirstMatch semijoin path in S1,
MySQL applies the coercion correctly, so strings such as `'not-a-date'`,
`'hv_820'`, and other non-numeric values compare equal to `YEAR 0000`.
However, after switching to the materialized semijoin path in S2, those same
rows disappear entirely, showing that the implicit VARCHAR-to-YEAR coercion is
not handled correctly in that execution path.
This is confirmed by the control query:
------------------------------------
mysql> SELECT 'control' AS stage, c1, c4, c5,
-> ('not-a-date' = c4) AS eq_bad_str,
-> ('0' = c4) AS eq_zero_str,
-> (CAST('not-a-date' AS SIGNED) = c4) AS eq_cast_bad_str
-> FROM t_ref
-> ORDER BY c1;
+---------+----+------+----------+------------+-------------+-----------------+
| stage | c1 | c4 | c5 | eq_bad_str | eq_zero_str | eq_cast_bad_str |
+---------+----+------+----------+------------+-------------+-----------------+
| control | 1 | 0 | 15:32:26 | 1 | 0 | 1 |
| control | 2 | 2008 | 2:57:30 | 0 | 0 | 0 |
| control | 3 | 2017 | 14:12:33 | 0 | 0 | 0 |
+---------+----+------+----------+------------+-------------+-----------------+
3 rows in set
------------------------------------
And under the same S2 setting, making the coercion explicit restores all 4
rows:
------------------------------------
mysql> SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt
-> FROM t_main m
-> WHERE m.c5 IS NULL
-> AND m.c4 = 8
-> AND LENGTH(COALESCE(m.c3, '')) >= 7
-> AND EXISTS (
-> SELECT 1
-> FROM t_ref r
-> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
-> AND r.c5 IS NOT NULL
-> AND CAST(m.c2 AS SIGNED) = r.c4
-> );
+---------------+-----+
| stage | cnt |
+---------------+-----+
| S2_cast_count | 4 |
+---------------+-----+
1 row in set
------------------------------------
So the wrong result is specific to the implicit `VARCHAR = YEAR` comparison in
the materialized correlated `EXISTS` path, not to the underlying data.
How to repeat:
SET SESSION sql_mode = '';
DROP DATABASE IF EXISTS repro_exists_year_materialized;
CREATE DATABASE repro_exists_year_materialized;
USE repro_exists_year_materialized;
CREATE TABLE t_main (
c1 INT NOT NULL PRIMARY KEY,
c2 VARCHAR(64) NOT NULL,
c3 VARCHAR(64) NULL,
c4 INT NULL,
c5 DATE NULL
);
CREATE TABLE t_ref (
c1 INT NOT NULL PRIMARY KEY,
c4 YEAR NOT NULL,
c5 TIME NULL,
c11 GEOMETRY NULL,
c13 TINYBLOB NULL
);
CREATE INDEX idx_main_c2 ON t_main (c2);
CREATE INDEX idx_main_c4 ON t_main (c4);
INSERT INTO t_main (c1, c2, c3, c4, c5) VALUES
(1, 'not-a-date', 'hv_4144', 8, NULL),
(2, 'hv_820', 'not-a-date', 8, NULL),
(3, 'k9zs-031qxa86z954', 'hv_4144', 8, NULL),
(4, 'dldk4w-7o6', 'hv_4144', 8, NULL),
(5, '123', 'hv_4144', 8, NULL),
(6, 'not-a-date', 'short', 8, NULL),
(7, 'not-a-date', 'hv_4144', 12, NULL),
(8, 'other', 'hv_4144', 8, '2023-01-01');
-- INSERT IGNORE is important: NULL becomes YEAR 0000
INSERT IGNORE INTO t_ref (c1, c4, c5, c11, c13) VALUES
(1, NULL, '15:32:26', NULL, NULL),
(2, 2008, '02:57:30', NULL, X'01'),
(3, 2017, '14:12:33', NULL, X'02');
ANALYZE TABLE t_main, t_ref;
-- Optional control: coercion semantics are valid
SELECT 'control' AS stage, c1, c4, c5,
('not-a-date' = c4) AS eq_bad_str,
('0' = c4) AS eq_zero_str,
(CAST('not-a-date' AS SIGNED) = c4) AS eq_cast_bad_str
FROM t_ref
ORDER BY c1;
-- S1: FirstMatch semijoin path
SET SESSION optimizer_switch =
'semijoin=on,firstmatch=on,materialization=off,loosescan=off,duplicateweedout=off';
EXPLAIN FORMAT=TRADITIONAL
SELECT m.c1, m.c2
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
);
SELECT 'S1_count' AS stage, COUNT(*) AS cnt
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
);
SELECT 'S1_rows' AS stage, m.c1, m.c2, m.c4
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
)
ORDER BY m.c1;
-- S2: same data, but force the materialized semijoin path
SET SESSION optimizer_switch =
'semijoin=on,firstmatch=off,materialization=on,loosescan=off,duplicateweedout=off';
EXPLAIN FORMAT=TRADITIONAL
SELECT m.c1, m.c2
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
);
SELECT 'S2_count' AS stage, COUNT(*) AS cnt
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
);
SELECT 'S2_rows' AS stage, m.c1, m.c2, m.c4
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND m.c2 = r.c4
)
ORDER BY m.c1;
-- Cross-check: explicit coercion restores the rows under S2
SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND CAST(m.c2 AS SIGNED) = r.c4
);
SELECT 'S2_cast_rows' AS stage, m.c1, m.c2, m.c4
FROM t_main m
WHERE m.c5 IS NULL
AND m.c4 = 8
AND LENGTH(COALESCE(m.c3, '')) >= 7
AND EXISTS (
SELECT 1
FROM t_ref r
WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL)
AND r.c5 IS NOT NULL
AND CAST(m.c2 AS SIGNED) = r.c4
)
ORDER BY m.c1;
Description: This issue can be described using two execution states over the same tables and the same data. In the baseline execution state S1, after creating the tables, inserting a small amount of data, and forcing the FirstMatch semijoin path, the query returns 4 rows: ------------------------------------ mysql> SELECT 'S1_count' AS stage, COUNT(*) AS cnt -> FROM t_main m -> WHERE m.c5 IS NULL -> AND m.c4 = 8 -> AND LENGTH(COALESCE(m.c3, '')) >= 7 -> AND EXISTS ( -> SELECT 1 -> FROM t_ref r -> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) -> AND r.c5 IS NOT NULL -> AND m.c2 = r.c4 -> ); +----------+-----+ | stage | cnt | +----------+-----+ | S1_count | 4 | +----------+-----+ 1 row in set ------------------------------------ The 4 returned rows are: ------------------------------------ mysql> SELECT 'S1_rows' AS stage, m.c1, m.c2, m.c4 -> FROM t_main m -> WHERE m.c5 IS NULL -> AND m.c4 = 8 -> AND LENGTH(COALESCE(m.c3, '')) >= 7 -> AND EXISTS ( -> SELECT 1 -> FROM t_ref r -> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) -> AND r.c5 IS NOT NULL -> AND m.c2 = r.c4 -> ) -> ORDER BY m.c1; +---------+----+-------------------+----+ | stage | c1 | c2 | c4 | +---------+----+-------------------+----+ | S1_rows | 1 | not-a-date | 8 | | S1_rows | 2 | hv_820 | 8 | | S1_rows | 3 | k9zs-031qxa86z954 | 8 | | S1_rows | 4 | dldk4w-7o6 | 8 | +---------+----+-------------------+----+ 4 rows in set ------------------------------------ Then I keep the table contents unchanged, but switch the optimizer to disable FirstMatch and force the materialized semijoin path. In this execution state S2, the same query returns 0 rows: ------------------------------------ mysql> SELECT 'S2_count' AS stage, COUNT(*) AS cnt -> FROM t_main m -> WHERE m.c5 IS NULL -> AND m.c4 = 8 -> AND LENGTH(COALESCE(m.c3, '')) >= 7 -> AND EXISTS ( -> SELECT 1 -> FROM t_ref r -> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) -> AND r.c5 IS NOT NULL -> AND m.c2 = r.c4 -> ); +----------+-----+ | stage | cnt | +----------+-----+ | S2_count | 0 | +----------+-----+ 1 row in set ------------------------------------ Because S1 and S2 use the same tables and the same data, they are logically equivalent execution states. Therefore the result sets must be identical. Instead, the observed results are: ------------------------------------ COUNT(S1) = 4 COUNT(S2) = 0 ------------------------------------ This violates the expected S1-to-S2 consistency relationship. The root cause is an optimizer bug in implicit coercion for correlated `EXISTS`. The outer operand is `VARCHAR`, while the correlated subquery compares it to a `YEAR` column. Under the FirstMatch semijoin path in S1, MySQL applies the coercion correctly, so strings such as `'not-a-date'`, `'hv_820'`, and other non-numeric values compare equal to `YEAR 0000`. However, after switching to the materialized semijoin path in S2, those same rows disappear entirely, showing that the implicit VARCHAR-to-YEAR coercion is not handled correctly in that execution path. This is confirmed by the control query: ------------------------------------ mysql> SELECT 'control' AS stage, c1, c4, c5, -> ('not-a-date' = c4) AS eq_bad_str, -> ('0' = c4) AS eq_zero_str, -> (CAST('not-a-date' AS SIGNED) = c4) AS eq_cast_bad_str -> FROM t_ref -> ORDER BY c1; +---------+----+------+----------+------------+-------------+-----------------+ | stage | c1 | c4 | c5 | eq_bad_str | eq_zero_str | eq_cast_bad_str | +---------+----+------+----------+------------+-------------+-----------------+ | control | 1 | 0 | 15:32:26 | 1 | 0 | 1 | | control | 2 | 2008 | 2:57:30 | 0 | 0 | 0 | | control | 3 | 2017 | 14:12:33 | 0 | 0 | 0 | +---------+----+------+----------+------------+-------------+-----------------+ 3 rows in set ------------------------------------ And under the same S2 setting, making the coercion explicit restores all 4 rows: ------------------------------------ mysql> SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt -> FROM t_main m -> WHERE m.c5 IS NULL -> AND m.c4 = 8 -> AND LENGTH(COALESCE(m.c3, '')) >= 7 -> AND EXISTS ( -> SELECT 1 -> FROM t_ref r -> WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) -> AND r.c5 IS NOT NULL -> AND CAST(m.c2 AS SIGNED) = r.c4 -> ); +---------------+-----+ | stage | cnt | +---------------+-----+ | S2_cast_count | 4 | +---------------+-----+ 1 row in set ------------------------------------ So the wrong result is specific to the implicit `VARCHAR = YEAR` comparison in the materialized correlated `EXISTS` path, not to the underlying data. How to repeat: SET SESSION sql_mode = ''; DROP DATABASE IF EXISTS repro_exists_year_materialized; CREATE DATABASE repro_exists_year_materialized; USE repro_exists_year_materialized; CREATE TABLE t_main ( c1 INT NOT NULL PRIMARY KEY, c2 VARCHAR(64) NOT NULL, c3 VARCHAR(64) NULL, c4 INT NULL, c5 DATE NULL ); CREATE TABLE t_ref ( c1 INT NOT NULL PRIMARY KEY, c4 YEAR NOT NULL, c5 TIME NULL, c11 GEOMETRY NULL, c13 TINYBLOB NULL ); CREATE INDEX idx_main_c2 ON t_main (c2); CREATE INDEX idx_main_c4 ON t_main (c4); INSERT INTO t_main (c1, c2, c3, c4, c5) VALUES (1, 'not-a-date', 'hv_4144', 8, NULL), (2, 'hv_820', 'not-a-date', 8, NULL), (3, 'k9zs-031qxa86z954', 'hv_4144', 8, NULL), (4, 'dldk4w-7o6', 'hv_4144', 8, NULL), (5, '123', 'hv_4144', 8, NULL), (6, 'not-a-date', 'short', 8, NULL), (7, 'not-a-date', 'hv_4144', 12, NULL), (8, 'other', 'hv_4144', 8, '2023-01-01'); -- INSERT IGNORE is important: NULL becomes YEAR 0000 INSERT IGNORE INTO t_ref (c1, c4, c5, c11, c13) VALUES (1, NULL, '15:32:26', NULL, NULL), (2, 2008, '02:57:30', NULL, X'01'), (3, 2017, '14:12:33', NULL, X'02'); ANALYZE TABLE t_main, t_ref; -- Optional control: coercion semantics are valid SELECT 'control' AS stage, c1, c4, c5, ('not-a-date' = c4) AS eq_bad_str, ('0' = c4) AS eq_zero_str, (CAST('not-a-date' AS SIGNED) = c4) AS eq_cast_bad_str FROM t_ref ORDER BY c1; -- S1: FirstMatch semijoin path SET SESSION optimizer_switch = 'semijoin=on,firstmatch=on,materialization=off,loosescan=off,duplicateweedout=off'; EXPLAIN FORMAT=TRADITIONAL SELECT m.c1, m.c2 FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ); SELECT 'S1_count' AS stage, COUNT(*) AS cnt FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ); SELECT 'S1_rows' AS stage, m.c1, m.c2, m.c4 FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ) ORDER BY m.c1; -- S2: same data, but force the materialized semijoin path SET SESSION optimizer_switch = 'semijoin=on,firstmatch=off,materialization=on,loosescan=off,duplicateweedout=off'; EXPLAIN FORMAT=TRADITIONAL SELECT m.c1, m.c2 FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ); SELECT 'S2_count' AS stage, COUNT(*) AS cnt FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ); SELECT 'S2_rows' AS stage, m.c1, m.c2, m.c4 FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND m.c2 = r.c4 ) ORDER BY m.c1; -- Cross-check: explicit coercion restores the rows under S2 SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND CAST(m.c2 AS SIGNED) = r.c4 ); SELECT 'S2_cast_rows' AS stage, m.c1, m.c2, m.c4 FROM t_main m WHERE m.c5 IS NULL AND m.c4 = 8 AND LENGTH(COALESCE(m.c3, '')) >= 7 AND EXISTS ( SELECT 1 FROM t_ref r WHERE (r.c13 IS NULL OR r.c11 IS NOT NULL) AND r.c5 IS NOT NULL AND CAST(m.c2 AS SIGNED) = r.c4 ) ORDER BY m.c1;