MySQL Bugs: #120475: Index Merge Intersection incorrectly drops matching rows in SELECT with OR predicate, returning wrong result count

Bug #120475	Index Merge Intersection incorrectly drops matching rows in SELECT with OR predicate, returning wrong result count
Submitted:	14 May 9:09	Modified:	18 May 10:20
Reporter:	Li Zeyan	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	8.0.46	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
In MySQL 9.6.0, when the optimizer chooses an Index Merge Intersection plan for a SELECT query with an OR predicate, the "Intersect rows sorted by row ID" operation incorrectly reduces the number of matching rows, causing valid result rows to be dropped.

Specifically, a query of the form:

  SELECT COUNT(*) FROM t WHERE c2 = 0.7559466140838738 OR c7 = 287496496;

returns 2 rows when index_merge_intersection=on (the default), but correctly returns 3 rows when index_merge_intersection=off.

The missing row is: (c2=0.7559466140838738, c3=1776068260, c7=2084714166).

This row should match the "c2 = 0.7559466140838738" condition. There are two rows in the table with c2 = 0.7559466140838738: one with (c3=939902875, c7=281834758) and the missing one with (c3=1776068260, c7=2084714166). The index merge intersection algorithm appears to incorrectly deduplicate or merge the row ID sets from the two index scans, treating two distinct matching rows as one.

The optimizer produces a semantically wrong query result under default settings. Since index_merge_intersection is enabled by default, any workload that triggers index merge intersection on tables with multiple secondary indexes and OR predicates may silently return incorrect results.

How to repeat:
DROP DATABASE IF EXISTS db1_min;
CREATE DATABASE db1_min;
USE db1_min;

CREATE TABLE `t` (
  `c2` double NOT NULL,
  `c3` int NOT NULL,
  `c7` int NOT NULL,
  UNIQUE KEY `i30` (`c3` DESC,`c2`),
  UNIQUE KEY `i64` (`c2` DESC,`c3`),
  UNIQUE KEY `i95` (`c7`,`c3`) USING BTREE,
  UNIQUE KEY `i61` (`c2`,`c7`) USING BTREE,
  UNIQUE KEY `i107` (`c3` DESC,`c7` DESC),
  UNIQUE KEY `i109` (`c3`,`c7`),
  UNIQUE KEY `i20` (`c7` DESC,`c2`),
  UNIQUE KEY `i114` (`c2`,`c7`,`c3`),
  UNIQUE KEY `i36` (`c7`,`c3`,`c2`),
  UNIQUE KEY `i67` (`c7`,`c2`),
  UNIQUE KEY `i83` (`c2`,`c3`),
  UNIQUE KEY `i68` (`c3` DESC,`c2` DESC),
  KEY `imerge_b_3119` (`c2`),
  KEY `imerge_a_3124` (`c7`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO `t` VALUES
  (0.7559466140838738,939902875,281834758),
  (0.7559466140838738,1776068260,2084714166),
  (0.31186114697490697,1699327703,1073842890),
  (1044617263,2076213040,287496496);

ANALYZE TABLE t;

-- Buggy: returns 2 (WRONG -- should be 3)
SELECT COUNT(*) AS cnt FROM t WHERE c2 = 0.7559466140838738 OR c7 = 287496496;

-- Correct: returns 3
SET SESSION optimizer_switch = 'index_merge_intersection=off';
SELECT COUNT(*) AS cnt FROM t WHERE c2 = 0.7559466140838738 OR c7 = 287496496;
SET SESSION optimizer_switch = 'default';

-- Show the buggy execution plan
EXPLAIN SELECT * FROM t WHERE c2 = 0.7559466140838738 OR c7 = 287496496;

With default optimizer (index_merge_intersection=on): 
+-----+
| cnt |
+-----+
|   2 | <- WRONG!
+-----+
1 row in set (0.00 sec)

With index_merge_intersection=off：
+-----+
| cnt |
+-----+
|   3 | <- CORRECT
+-----+
1 row in set (0.00 sec)

Expected result: Both queries should return cnt = 3 (two rows match c2 = 0.7559466140838738 and one row matches c7 = 287496496; no row matches both conditions).

Actual result: With index_merge_intersection=on (default), the first query returns cnt = 2. One of the two rows satisfying c2 = 0.7559466140838738 is incorrectly dropped by the index merge intersect operation.

Suggested fix:
Investigate the index merge intersection implementation in the storage engine layer, specifically the "Intersect rows sorted by row ID" merge/deduplication logic. When the optimizer performs an index merge for an OR predicate (i.e., it should be performing a union of row IDs from multiple index scans), the algorithm appears to incorrectly intersect or deduplicate the row ID sets. The merge sort of row IDs from the two index scans may be collapsing two distinct rows that happen to share a c2 value into a single result entry.

Key areas to investigate:
1. The row ID merge logic in index_merge_read() or equivalent function -- ensure that when merging sorted row ID streams for an OR (union) operation, all matching row IDs are preserved, including duplicates that come from different index scans.
2. Verify that the "sort by row ID and deduplicate" step correctly handles the case where two distinct rows (with different primary keys/row IDs) both satisfy one branch of the OR condition but are adjacent in the sorted output.

The minimal reproduction sql for this bug

Attachment: min_mysql_index_merge_intersect_repro.sql (application/octet-stream, text), 16.35 KiB.

Update on possible root cause & Minimal reproduction SQL

PLEASE use the attachment SQL file (min_mysql_index_merge_intersect_repro.sql) to reproduce the bug. The SQL in "How to repeat" failed to reproduce the bug. I apologize for the mistake.

After further investigation, I want to update my discovery on the possible root cause:
The optimizer performs an "Intersect rows sorted by row ID" on TWO indexes both covering c2 (i83 and hash_idx_3119), which incorrectly reduces the 2 matching rows for c2 = 0.7559466140838738 down to 1 row. The intersect step is meant to find rows that appear in both index scans for the same condition, but it wrongly deduplicates two distinct rows that happen to share the same c2 value into one. This deduplicated result is then unioned with the c7 index scan result, yielding 2 rows total instead of the correct 3.

The EXPLAIN output shows:
  -> Filter: ((t.c2 = 0.7559466140838738) or (t.c7 = 287496496))
      -> Deduplicate rows sorted by row ID
          -> Intersect rows sorted by row ID
              -> Index range scan on t using i83 over (c2 = 0.7559466140838738)
              -> Index range scan on t using hash_idx_3119 over (c2 = 0.7559466140838738)
          -> Covering index range scan on t using i95 over (c7 = 287496496)
Note: Both i83 and hash_idx_3119 scans find 2 matching rows, but the Intersect step incorrectly reduces them to 1 row.

Thank you for the bug report.
Verified as described.