Description:
The query optimizer produces a severely incorrect cardinality estimation for an index lookup when evaluating a boolean predicate of the form:
(t0.c0 OR t0.c1) AND t0.c1 = 1
The issue occurs when multiple indexes exist on the table. The optimizer estimates nearly half of the table rows (~498961) for the predicate c1 = 1, even though the predicate is highly selective and should return only one row.
After dropping another index, the optimizer generates the correct estimate (rows=1) for the same query.
This indicates a cardinality estimation bug related to predicate simplification or index interaction during cost estimation.
How to repeat:
1. Create a table with 1 M rows where c1 is unique (values 1…1,000,000):
CREATE TABLE t0 (c0 INT, c1 DOUBLE);
DELIMITER //
CREATE PROCEDURE batch_insert_numbers()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE batch_size INT DEFAULT 1000;
WHILE i <= 1000000 DO
START TRANSACTION;
WHILE i <= 1000000 AND batch_size > 0 DO
INSERT INTO t0 VALUES (i, i);
SET i = i + 1;
SET batch_size = batch_size - 1;
END WHILE;
COMMIT;
SET batch_size = 1000;
END WHILE;
END //
DELIMITER ;
CALL batch_insert_numbers();
2. Build two indexes, including a single-column on c1 and a composite on (c0, c1):
CREATE INDEX i0 ON t0 (c1);
CREATE INDEX i1 ON t0 (c0, c1);
3. Explain the query:
EXPLAIN SELECT * FROM t0 WHERE (t0.c0 OR t0.c1) AND t0.c1 = 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((0 <> t0.c0) or (0 <> t0.c1)) (cost=52452 rows=498961)
-> Index lookup on t0 using i0 (c1 = 1) (cost=52452 rows=498961)
4. Drop the composite index i1
DROP INDEX i1 ON t0;
5. Re-run the EXPLAIN:
EXPLAIN SELECT * FROM t0 WHERE ((t0.c0) OR (t0.c1)) AND t0.c1 = 1 \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((0 <> t0.c0) or (0 <> t0.c1)) (cost=0.445 rows=1)
-> Index lookup on t0 using i0 (c1 = 1) (cost=0.445 rows=1)
6. Expected result
Regardless of the presence of index i1 (c0, c1), the row estimate should be exactly 1, because c1 = 1 matches one row in the table and the OR condition cannot change that.
Suggested fix:
Improve the logic that simplifies a disjunction (OR) when one of its operands is proven always true by a subsequent conjunction. After c1 = 1, the predicate c1 is constantly true, so the entire OR collapses to TRUE. The remaining condition is just c1 = 1.
Alternatively, ensure that the cardinality estimation for the index lookup on i0 does not get “polluted” by the histogram or statistics of the composite index’s leading column. It appears the presence of i1 causes the estimate to be based on the selectivity of (c0 OR c1) instead of the combined AND.
Description: The query optimizer produces a severely incorrect cardinality estimation for an index lookup when evaluating a boolean predicate of the form: (t0.c0 OR t0.c1) AND t0.c1 = 1 The issue occurs when multiple indexes exist on the table. The optimizer estimates nearly half of the table rows (~498961) for the predicate c1 = 1, even though the predicate is highly selective and should return only one row. After dropping another index, the optimizer generates the correct estimate (rows=1) for the same query. This indicates a cardinality estimation bug related to predicate simplification or index interaction during cost estimation. How to repeat: 1. Create a table with 1 M rows where c1 is unique (values 1…1,000,000): CREATE TABLE t0 (c0 INT, c1 DOUBLE); DELIMITER // CREATE PROCEDURE batch_insert_numbers() BEGIN DECLARE i INT DEFAULT 1; DECLARE batch_size INT DEFAULT 1000; WHILE i <= 1000000 DO START TRANSACTION; WHILE i <= 1000000 AND batch_size > 0 DO INSERT INTO t0 VALUES (i, i); SET i = i + 1; SET batch_size = batch_size - 1; END WHILE; COMMIT; SET batch_size = 1000; END WHILE; END // DELIMITER ; CALL batch_insert_numbers(); 2. Build two indexes, including a single-column on c1 and a composite on (c0, c1): CREATE INDEX i0 ON t0 (c1); CREATE INDEX i1 ON t0 (c0, c1); 3. Explain the query: EXPLAIN SELECT * FROM t0 WHERE (t0.c0 OR t0.c1) AND t0.c1 = 1 \G *************************** 1. row *************************** EXPLAIN: -> Filter: ((0 <> t0.c0) or (0 <> t0.c1)) (cost=52452 rows=498961) -> Index lookup on t0 using i0 (c1 = 1) (cost=52452 rows=498961) 4. Drop the composite index i1 DROP INDEX i1 ON t0; 5. Re-run the EXPLAIN: EXPLAIN SELECT * FROM t0 WHERE ((t0.c0) OR (t0.c1)) AND t0.c1 = 1 \G *************************** 1. row *************************** EXPLAIN: -> Filter: ((0 <> t0.c0) or (0 <> t0.c1)) (cost=0.445 rows=1) -> Index lookup on t0 using i0 (c1 = 1) (cost=0.445 rows=1) 6. Expected result Regardless of the presence of index i1 (c0, c1), the row estimate should be exactly 1, because c1 = 1 matches one row in the table and the OR condition cannot change that. Suggested fix: Improve the logic that simplifies a disjunction (OR) when one of its operands is proven always true by a subsequent conjunction. After c1 = 1, the predicate c1 is constantly true, so the entire OR collapses to TRUE. The remaining condition is just c1 = 1. Alternatively, ensure that the cardinality estimation for the index lookup on i0 does not get “polluted” by the histogram or statistics of the composite index’s leading column. It appears the presence of i1 causes the estimate to be based on the selectivity of (c0 OR c1) instead of the combined AND.