Bug #120735 Index Range Scan Incorrectly Shrinks Range for IN Predicate with EXISTS and CAST(NULL AS SIGNED), Missing Rows
Submitted: 21 Jun 14:10 Modified: 22 Jun 10:19
Reporter: Annie liu Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:MySQL 9.6 OS:Any
Assigned to: CPU Architecture:Any

[21 Jun 14:10] Annie liu
Description:
When the WHERE clause uses an IN expression whose right‑hand side includes an EXISTS subquery and -(CAST(NULL AS SIGNED)), the optimizer’s range scan on an indexed DECIMAL column incorrectly restricts the range to c0 = 1. In reality, row‑by‑row evaluation shows that both c0 = 0 and c0 = 1 satisfy the predicate.

This miscalculation causes the single‑table query to return only 4 rows, while the derived‑table (CTE materialized then table‑scanned) returns 6 rows, matching the row‑wise verification.

Actual vs Expected Results

Query Type	            Returned Rows	            	  Status
Single‑table               4 rows (all c0=1)       	          ✗
Derived table 	            6 rows (c0=0 and c0=1)		  ✓
Row‑wise predicate check   6 rows (c0=0 & c0=1)	          ✓

EXPLAIN Analysis

Single‑table query plan (incorrect)

-> Limit: 6 row(s)
    -> Sort: s.vp_rowid
        -> Filter: (s.c0 in (exists(select #3), <cache>(-(cast(NULL as signed)))))
            -> Covering index range scan on s using idx_c0 over (c0 = 1.000...)
The optimizer generates a range scan limited to c0 = 1.000..., only matching rows with c0=1.

Derived table query plan (correct)

-> Limit: 6 row(s)
    -> Nested loop inner join
        -> Sort: vp_rowid
            -> Filter: (lcte.c0 in (exists(select #6), <cache>(-(cast(NULL as signed)))))
                -> Table scan on lcte
                    -> Materialize CTE lcte
...
The CTE is materialized and then a full table scan applies the filter, correctly including c0=0 rows.

Root Cause Analysis

IN predicate evaluation: c0 IN (EXISTS(SELECT 1), -(CAST(NULL AS SIGNED)))

EXISTS(SELECT 1) is always TRUE (i.e., 1).

-(CAST(NULL AS SIGNED)) evaluates to NULL (since CAST(NULL AS SIGNED) is NULL, negation is still NULL).

So the list becomes (1, NULL).

For DECIMAL column c0, c0 IN (1, NULL) logically means c0 = 1 OR c0 = NULL. Since c0 = NULL is NULL (three‑valued logic), the effective condition should be c0 = 1. However, row‑wise verification shows that c0=0 also returns TRUE for 0 IN (1, NULL). This indicates that MySQL treats NULL in the IN list specially.

Index range miscalculation: When building the range, the optimizer concludes that only c0 = 1 is possible, ignoring c0=0. This suggests that the optimizer's logic for handling NULL in IN lists is flawed: it may treat NULL as not contributing to the range, but fails to account for the actual row‑wise truth (which includes 0).

How to repeat:
DROP DATABASE IF EXISTS repro_mysql620_17;
CREATE DATABASE repro_mysql620_17;
USE repro_mysql620_17;

SET SESSION sql_mode =
  'ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION';

SET SESSION time_zone = '+00:00';

CREATE TABLE s (
  vp_rowid BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  c0 DECIMAL(65,30) NULL,
  KEY idx_c0(c0 DESC)
) ENGINE=InnoDB;

INSERT INTO s(c0) VALUES
  (0), (0),
  (1), (1), (1), (1),
  (2), (NULL), (3);

CREATE TABLE l (
  vp_rowid BIGINT NOT NULL PRIMARY KEY,
  c0 DECIMAL(65,30) NULL
) ENGINE=InnoDB;

CREATE TABLE r (
  vp_rowid BIGINT NOT NULL PRIMARY KEY,
  c0 DECIMAL(65,30) NULL
) ENGINE=InnoDB;

INSERT INTO l SELECT vp_rowid, c0 FROM s;
INSERT INTO r SELECT vp_rowid, c0 FROM s;

ALTER TABLE r
  ADD CONSTRAINT fk_r_l
  FOREIGN KEY (vp_rowid) REFERENCES l(vp_rowid);

-- Single‑table query (range scan): returns 4 rows (incorrect, misses the two c0=0 rows)
SELECT
  GREATEST(
    EXISTS (SELECT 1 WHERE FALSE),
    CASE 'T\t'
      WHEN 0.48091024131969884 THEN s.vp_rowid
      ELSE NULL
    END
  ) AS ref0
FROM s
WHERE c0 IN (EXISTS (SELECT 1), -(CAST(NULL AS SIGNED)))
ORDER BY vp_rowid
LIMIT 6;

-- Derived table (CTE materialized then table scan): returns 6 rows (correct)
SELECT
  GREATEST(
    EXISTS (SELECT 1 WHERE FALSE),
    CASE 'T\t'
      WHEN 0.48091024131969884 THEN d.vp_rowid
      ELSE NULL
    END
  ) AS ref0
FROM (
  WITH
  lcte AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn,
      vp_rowid,
      c0
    FROM l
  ),
  rcte AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY vp_rowid) AS rn,
      vp_rowid,
      c0
    FROM r
  )
  SELECT
    lcte.vp_rowid,
    lcte.c0
  FROM lcte
  JOIN rcte ON lcte.rn = rcte.rn
) AS d
WHERE c0 IN (EXISTS (SELECT 1), -(CAST(NULL AS SIGNED)))
ORDER BY vp_rowid
LIMIT 6;

-- Row‑wise verification of the predicate
SELECT
  vp_rowid,
  c0,
  c0 IN (EXISTS (SELECT 1), -(CAST(NULL AS SIGNED))) AS pred
FROM s
ORDER BY vp_rowid;
[22 Jun 10:19] Roy Lyseng
Thank you for the bug report.
However, my analysis is that the correct result is 4 rows, which is also what the latest 9.7 release is producing. Hence, closing as not a bug.