Bug #120524 Materialized Correlated EXISTS Returns Positive COUNT(*) But No Rows for Implicit VARCHAR-to-YEAR Comparison
Submitted: 22 May 9:40 Modified: 26 May 6:27
Reporter: Y F Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.45 OS:Ubuntu (Ubuntu 20.04.4 LTS)
Assigned to: CPU Architecture:x86 (Intel(R) Xeon(R) Platinum 8358P @ 2.60GHz (64 Cores, 96MB L3 Cache))

[22 May 9:40] Y F
Description:
This issue can be described using a single execution state over fixed tables
and fixed data.

After creating the tables, loading the replay-derived rows, running
`ANALYZE TABLE`, and forcing the materialized semijoin path, the following
query reports that 17 rows satisfy the predicate:

------------------------------------
mysql> SELECT 'S2_implicit_count' AS stage, COUNT(*) AS cnt
    -> FROM t_main m FORCE INDEX (idx_c4)
    -> WHERE m.c5 IS NOT NULL
    ->   AND m.c4 = 4
    ->   AND m.c6 IN ('vhv_163', '01e0')
    ->   AND EXISTS (
    ->       SELECT 1
    ->       FROM t_ref r
    ->       WHERE m.c6 = r.c4
    ->   );
+-------------------+-----+
| stage             | cnt |
+-------------------+-----+
| S2_implicit_count | 17  |
+-------------------+-----+
1 row in set
------------------------------------

However, in the same execution state, over the same tables, with the same
predicate, the corresponding row-producing query returns no rows at all:

------------------------------------
mysql> SELECT 'S2_implicit_rows' AS stage, m.c1, m.c6, m.c4, m.c5
    -> FROM t_main m FORCE INDEX (idx_c4)
    -> WHERE m.c5 IS NOT NULL
    ->   AND m.c4 = 4
    ->   AND m.c6 IN ('vhv_163', '01e0')
    ->   AND EXISTS (
    ->       SELECT 1
    ->       FROM t_ref r
    ->       WHERE m.c6 = r.c4
    ->   )
    -> ORDER BY m.c1;
Empty set
------------------------------------

Because these two statements use the same tables, the same data, and the same
filter predicate in the same execution state, they are logically equivalent
with respect to qualifying rows. Therefore the row-producing query must return
exactly the rows counted by the `COUNT(*)` query. Instead, the observed
results are:

------------------------------------
COUNT(predicate) = 17
ROWS(predicate)  = 0
------------------------------------

This is an internal wrong-result contradiction.

The EXPLAIN output shows that the query is executed through a materialized
semijoin shape:

------------------------------------
mysql> EXPLAIN FORMAT=TRADITIONAL SELECT ... ;
Plan summary:
    1. m: ref on idx_c4, key_len=5, ref=const, rows=34,
       Extra=Using where
    2. <subquery2>: eq_ref on <auto_distinct_key>, key_len=1,
       ref=repro_exists_year_empty_rows.m.c6, rows=1,
       Extra=Using where
    3. r: MATERIALIZED subquery scan, type=ALL, rows=6
------------------------------------

The issue is not explained by the underlying comparison semantics. The scalar
control query confirms that implicit comparison against `YEAR 0` is valid for
`'vhv_163'` and invalid for `'01e0'`:

------------------------------------
mysql> SELECT c1, c4, c5,
    ->        ('vhv_163' = c4) AS eq_vhv163,
    ->        ('01e0'    = c4) AS eq_01e0,
    ->        (CAST('vhv_163' AS SIGNED) = c4) AS cast_eq_vhv163,
    ->        (CAST('01e0'    AS SIGNED) = c4) AS cast_eq_01e0
    -> FROM t_ref
    -> WHERE c4 = 0 OR c4 IS NULL
    -> ORDER BY c1;
+---------+----+----------+-----------+---------+----------------+--------------+
| c1      | c4 | c5       | eq_vhv163 | eq_01e0 | cast_eq_vhv163 | cast_eq_01e0 |
+---------+----+----------+-----------+---------+----------------+--------------+
| 9000001 | 0  | NULL     | 1         | 0       | 1              | 0            |
| 9000003 | 0  | 19:24:43 | 1         | 0       | 1              | 0            |
| 9000004 | 0  | 2:50:36  | 1         | 0       | 1              | 0            |
| 9000007 | 0  | 19:40:58 | 1         | 0       | 1              | 0            |
+---------+----+----------+-----------+---------+----------------+--------------+
4 rows in set
------------------------------------

Under the same optimizer setting, making the coercion explicit restores all 17
matching rows:

------------------------------------
mysql> SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt
    -> FROM t_main m FORCE INDEX (idx_c4)
    -> WHERE m.c5 IS NOT NULL
    ->   AND m.c4 = 4
    ->   AND m.c6 IN ('vhv_163', '01e0')
    ->   AND EXISTS (
    ->       SELECT 1
    ->       FROM t_ref r
    ->       WHERE CAST(m.c6 AS SIGNED) = r.c4
    ->   );
+---------------+-----+
| stage         | cnt |
+---------------+-----+
| S2_cast_count | 17  |
+---------------+-----+
1 row in set
------------------------------------

So the wrong result is specific to the implicit `VARCHAR = YEAR` comparison in
this materialized correlated `EXISTS` execution path. The exact executor
interaction that makes `COUNT(*)` succeed while the row-producing query becomes
empty may require further investigation, but this reproducer clearly shows that
MySQL is returning contradictory results for the same predicate.

How to repeat:
DROP DATABASE IF EXISTS repro_exists_year_empty_rows;
CREATE DATABASE repro_exists_year_empty_rows;
USE repro_exists_year_empty_rows;
SELECT @@version AS mysql_version, @@sql_mode AS sql_mode;

SET SESSION optimizer_switch =
'semijoin=on,firstmatch=off,materialization=on,loosescan=off,duplicateweedout=off';

CREATE TABLE t_main (
    c1 INT NOT NULL PRIMARY KEY,
    c4 INT NULL,
    c5 DATE NULL,
    c6 VARCHAR(10) NULL,
    KEY idx_c4 (c4),
    KEY idx_c5 (c5),
    KEY idx_c6 (c6),
    KEY idx_c5_c4 (c5, c4)
);

CREATE TABLE t_ref (
    c1 INT NOT NULL PRIMARY KEY,
    c4 YEAR NOT NULL,
    c5 TIME NULL
);

-- YEAR 0 witness rows created via INSERT IGNORE + NULL.
INSERT IGNORE INTO t_ref (c1, c4, c5) VALUES
(9000001, NULL, NULL),
(9000003, NULL, '19:24:43'),
(9000004, NULL, '02:50:36'),
(9000007, NULL, '19:40:58'),
(9000002, 2019, NULL),
(9000006, 2010, '10:52:44');

-- Replay-derived outer rows only.
-- All rows have c4 = 4 and c5 IS NOT NULL.
-- Rows with c6='vhv_163' should match YEAR 0 via implicit coercion.
-- Rows with c6='01e0' are distractors.
INSERT INTO t_main (c1, c4, c5, c6) VALUES
(1000027, 4, '2013-06-06', 'vhv_163'),
(1000055, 4, '9999-12-31', '01e0'),
(1000086, 4, '2011-08-12', 'vhv_163'),
(1000100, 4, '9999-12-31', '01e0'),
(1000127, 4, '9999-12-31', 'vhv_163'),
(1000141, 4, '1000-01-01', '01e0'),
(1000144, 4, '1000-01-01', 'vhv_163'),
(1000155, 4, '9999-12-31', '01e0'),
(1000183, 4, '9999-12-31', '01e0'),
(1000204, 4, '9999-12-31', 'vhv_163'),
(1000206, 4, '2024-02-29', '01e0'),
(1000216, 4, '1000-01-01', 'vhv_163'),
(1000227, 4, '2013-02-14', '01e0'),
(1000248, 4, '1000-01-01', '01e0'),
(1000256, 4, '2014-11-27', '01e0'),
(1000271, 4, '2024-02-29', 'vhv_163'),
(1000288, 4, '1000-01-01', 'vhv_163'),
(1000291, 4, '1000-01-01', 'vhv_163'),
(1000299, 4, '1000-01-01', '01e0'),
(1000312, 4, '1000-01-01', '01e0'),
(1000323, 4, '2000-09-09', 'vhv_163'),
(1000335, 4, '1000-01-01', '01e0'),
(1000365, 4, '2010-07-23', 'vhv_163'),
(1000385, 4, '1000-01-01', '01e0'),
(1000407, 4, '1000-01-01', '01e0'),
(1000411, 4, '1000-01-01', 'vhv_163'),
(1000426, 4, '1000-01-01', 'vhv_163'),
(1000434, 4, '1000-01-01', '01e0'),
(1000437, 4, '2004-09-25', 'vhv_163'),
(1000461, 4, '2024-02-29', 'vhv_163'),
(1000476, 4, '9999-12-31', 'vhv_163'),
(1000480, 4, '2017-02-10', 'vhv_163'),
(1000489, 4, '1000-01-01', '01e0'),
(1000496, 4, '1000-01-01', '01e0');

ANALYZE TABLE t_main, t_ref;

-- Control: scalar comparison semantics are valid.
SELECT
    c1,
    c4,
    c5,
    ('vhv_163' = c4) AS eq_vhv163,
    ('01e0'    = c4) AS eq_01e0,
    (CAST('vhv_163' AS SIGNED) = c4) AS cast_eq_vhv163,
    (CAST('01e0'    AS SIGNED) = c4) AS cast_eq_01e0
FROM t_ref
WHERE c4 = 0 OR c4 IS NULL
ORDER BY c1;

EXPLAIN FORMAT=TRADITIONAL
SELECT m.c1, m.c6, m.c4, m.c5
FROM t_main m FORCE INDEX (idx_c4)
WHERE m.c5 IS NOT NULL
  AND m.c4 = 4
  AND m.c6 IN ('vhv_163', '01e0')
  AND EXISTS (
      SELECT 1
      FROM t_ref r
      WHERE m.c6 = r.c4
  )
ORDER BY m.c1;

-- Bug check 1: count query.
SELECT 'S2_implicit_count' AS stage, COUNT(*) AS cnt
FROM t_main m FORCE INDEX (idx_c4)
WHERE m.c5 IS NOT NULL
  AND m.c4 = 4
  AND m.c6 IN ('vhv_163', '01e0')
  AND EXISTS (
      SELECT 1
      FROM t_ref r
      WHERE m.c6 = r.c4
  );

-- Bug check 2: same predicate, but row-producing query.
SELECT 'S2_implicit_rows' AS stage, m.c1, m.c6, m.c4, m.c5
FROM t_main m FORCE INDEX (idx_c4)
WHERE m.c5 IS NOT NULL
  AND m.c4 = 4
  AND m.c6 IN ('vhv_163', '01e0')
  AND EXISTS (
      SELECT 1
      FROM t_ref r
      WHERE m.c6 = r.c4
  )
ORDER BY m.c1;

-- Control: explicit cast should return the matching rows.
SELECT 'S2_cast_count' AS stage, COUNT(*) AS cnt
FROM t_main m FORCE INDEX (idx_c4)
WHERE m.c5 IS NOT NULL
  AND m.c4 = 4
  AND m.c6 IN ('vhv_163', '01e0')
  AND EXISTS (
      SELECT 1
      FROM t_ref r
      WHERE CAST(m.c6 AS SIGNED) = r.c4
  );

SELECT 'S2_cast_rows' AS stage, m.c1, m.c6, m.c4, m.c5
FROM t_main m FORCE INDEX (idx_c4)
WHERE m.c5 IS NOT NULL
  AND m.c4 = 4
  AND m.c6 IN ('vhv_163', '01e0')
  AND EXISTS (
      SELECT 1
      FROM t_ref r
      WHERE CAST(m.c6 AS SIGNED) = r.c4
  )
ORDER BY m.c1;
[26 May 6:27] Roy Lyseng
Thank you for the bug report.
Verified as described.