Bug #120522 Sub‑optimal execution plan for multi‑table row‑value IN condition leads to full index scan instead of index range scans
Submitted: 22 May 6:39
Reporter: jinhui lai Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:9.7.0 OS:Any
Assigned to: CPU Architecture:Any

[22 May 6:39] jinhui lai
Description:
Hi, MySQL developers, thanks for reading my report. I find a missed optimization in MySQL.

When a WHERE clause uses (t0.c0, t1.c0) IN ((1,1), (2,2)), the MySQL optimizer fails to push the individual column-value conditions down to the respective tables. Instead, it generates a cross join (inner hash join with no join condition) between a small filtered set from one table and the entire other table, then applies the row-value filter on the huge intermediate result.
This causes a full index scan of the second table, even when the constants would allow an efficient index range scan on that table.

The logically equivalent condition (t0.c0 = 1 AND t1.c0 = 1) OR (t0.c0 = 2 AND t1.c0 = 2) produces the expected plan: both tables are accessed via range scans, the join produces only 4 rows, and the filter is trivially applied.

Such predicates are a widespread, natural way to filter joins by paired values (product‑category IDs, user‑role pairs, etc.) and should be optimized natively by the engine.

How to repeat:
1. Create table and insert data
CREATE TABLE t0(c0 INT PRIMARY KEY);
CREATE TABLE t1(c0 INT PRIMARY KEY);
DELIMITER //
CREATE PROCEDURE batch_insert_numbers()
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE batch_size INT DEFAULT 1000;
    
    WHILE i <= 1000000 DO
        START TRANSACTION;
        WHILE i <= 1000000 AND batch_size > 0 DO
            INSERT INTO t0 VALUES (i);
            INSERT INTO t1 VALUES (i);
            SET i = i + 1;
            SET batch_size = batch_size - 1;
        END WHILE;
        COMMIT;
        SET batch_size = 1000;
    END WHILE;
END //
DELIMITER ;

-- Insert into t0 and t1 with 1,000,000 rows via procedure
CALL batch_insert_numbers();

2. Execute query
-- negative case
EXPLAIN ANALYZE SELECT * FROM t0, t1 WHERE (t0.c0, t1.c0) IN ((1, 1), (2, 2)) \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: ((t0.c0,t1.c0) in ((1,1),(2,2)))  (cost=200178 rows=399600) (actual time=2.28..307 rows=2 loops=1)
    -> Inner hash join (no condition)  (cost=200178 rows=399600) (actual time=2.28..201 rows=2e+6 loops=1)
        -> Covering index scan on t1 using PRIMARY  (cost=10178 rows=999000) (actual time=2.23..111 rows=1e+6 loops=1)
        -> Hash
            -> Covering index range scan on t0 using PRIMARY over (c0 = 1) OR (c0 = 2)  (cost=0.91 rows=2) (actual time=0.0333..0.0361 rows=2 loops=1)
1 row in set (0.308 sec)

-- positiva
EXPLAIN ANALYZE SELECT * FROM t0, t1 WHERE (t0.c0 = 1 AND t1.c0 = 1) OR (t0.c0 = 2 AND t1.c0 = 2) \G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (((t1.c0 = 1) and (t0.c0 = 1)) or ((t1.c0 = 2) and (t0.c0 = 2)))  (cost=2.73 rows=4) (actual time=0.0551..0.058 rows=2 loops=1)
    -> Inner hash join (no condition)  (cost=2.73 rows=4) (actual time=0.0541..0.0566 rows=4 loops=1)
        -> Covering index range scan on t1 using PRIMARY over (c0 = 1) OR (c0 = 2)  (cost=0.81 rows=2) (actual time=0.00776..0.00937 rows=2 loops=1)
        -> Hash
            -> Filter: ((t0.c0 = 1) or (t0.c0 = 2))  (cost=0.91 rows=2) (actual time=0.0349..0.0384 rows=2 loops=1)
                -> Covering index range scan on t0 using PRIMARY over (c0 = 1) OR (c0 = 2)  (cost=0.91 rows=2) (actual time=0.0333..0.0363 rows=2 loops=1)
1 row in set (0.001 sec)

Suggested fix:
Decompose multi-table row-value IN conditions into an OR-of-AND form early enough in optimization to allow index range scans on all involved tables, avoiding the current cross-join and filter approach.