Bug #120490 Join Not Eliminated When Constant Table is Empty Due to PrimaryKeyCol IN (NULL) Condition
Submitted: 18 May 3:24 Modified: 18 May 8:25
Reporter: jinhui lai Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:9.7.0 OS:Any
Assigned to: CPU Architecture:Any

[18 May 3:24] jinhui lai
Description:
When a query involves a join and one of the tables becomes a constant table (i.e., it can be evaluated to a single row or to zero rows using a unique key lookup), the optimizer should be able to short-circuit execution if the constant table produces no rows. In such cases, the entire join result is empty, and no other table should be accessed.

The second and third queries below demonstrate this correct behaviour: t0.c0 = 0 makes t0 a constant table with no matching row (t0.c0 is the primary key!!!), and the optimizer outputs Zero rows (no matching row in const table) without scanning t1.

However, when an additional impossible condition like t0.c0 IN (NULL) (which always evaluates to NULL/UNKNOWN and therefore cannot be TRUE) is added, the optimizer still executes a full nested-loop join and scans all rows of t1, even though the constant table t0 again has zero rows. 

How to repeat:
1. Create tables and populate data:
SET SESSION range_alloc_block_size = 1024*1024*16; -- 16 MB; default value is 4096, i.e., 4 KB
CREATE TABLE t0(c0 FLOAT PRIMARY KEY);
CREATE TABLE t1(c0 FLOAT);

DELIMITER //
CREATE PROCEDURE batch_insert_numbers()
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE batch_size INT DEFAULT 1000;
    
    WHILE i <= 1000000 DO
        START TRANSACTION;
        WHILE i <= 1000000 AND batch_size > 0 DO
            INSERT INTO t0 VALUES (i);
            INSERT INTO t1 VALUES (i);
            SET i = i + 1;
            SET batch_size = batch_size - 1;
        END WHILE;
        COMMIT;
        SET batch_size = 1000;
    END WHILE;
END //
DELIMITER ;

-- Insert into t0 and t1 with 1,000,000 rows via procedure
CALL batch_insert_numbers();

2. Run the problematic query:
explain analyze SELECT * FROM t0, t1 WHERE t0.c0 > t1.c0 AND t0.c0 IN (NULL) AND t0.c0 = 0 \G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=200277 rows=998585) (actual time=491..491 rows=0 loops=1)
    -> Table scan on t1  (cost=100419 rows=998585) (actual time=0.069..265 rows=1e+6 loops=1)
    -> Filter: (('0' > t1.c0) and ('0' = NULL) and ('0' = 0))  (cost=0.35 rows=1) (actual time=148e-6..148e-6 rows=0 loops=1e+6)
        -> Constant row from t0  (cost=0.35 rows=1) (actual time=48.4e-6..48.4e-6 rows=0 loops=1e+6)
1 row in set, 1 warning (0.508 sec)

3. Run the correctly optimized query for comparison:
explain analyze SELECT * FROM t0, t1 WHERE t0.c0 > t1.c0 AND t0.c0 IS NULL AND t0.c0 = 0 \G
*************************** 1. row ***************************
EXPLAIN: -> Zero rows (Impossible WHERE)  (cost=0..0 rows=0) (actual time=60e-6..60e-6 rows=0 loops=1)
1 row in set (0.001 sec)

explain analyze SELECT * FROM t0, t1 WHERE t1.c0 > t0.c0 AND t0.c0 = 0 \G
*************************** 1. row ***************************
EXPLAIN: -> Zero rows (no matching row in const table)  (cost=0..0 rows=0) (actual time=80e-6..80e-6 rows=0 loops=1)
1 row in set (0.001 sec)

All queries above have a condition that makes t0 a constant table with no qualifying row (t0.c0 = 0, and t0's primary key ensures at most one match). Therefore all should be optimized to Zero rows (no matching row in const table) without touching t1. The presence of the impossible IN (NULL) condition should not prevent this early pruning.

Suggested fix:
When a condition like PrimaryKeyCol IN (NULL) is present, fold it to FALSE (or NULL as a boolean that is never true) and use it to mark the whole SELECT as returning no rows.
[18 May 8:25] Chaithra Marsur Gopala Reddy
Hi jinhui lai,

Thank you for the test case. Verified as described.