Description:
Outer join elimination is not applied to queries with the IN preducate and the IF flow control function the the cases where it can be apllied.
The query
SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE t1.a IN (t2.a,t2.b)
has the following EXPLAIN outut:
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a WHERE t1.a IN (t2.a,t2.b);
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 9 | |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.a | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
An equivalent querY
SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE t1.a=t2.a OR t1.a=t2.b;
has a different execution plan:
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a WHERE t1.a=t2.a OR t1.a=t2.b;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| 1 | SIMPLE | t2 | ALL | PRIMARY | NULL | NULL | NULL | 4 | Using where |
| 1 | SIMPLE | t1 | eq_ref | PRIMARY | PRIMARY | 4 | test.t2.a | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
Yet accessing table t2 before table t1 is more efficient for both queries.
The fact is that a possible outer join elimination is not applied for the first query due to a deficiency of the code that calculates the not_null_tables attribute for IN predicates.
The same is true fir the IF function:
mysql> EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a WHERE t1.a > IF(t1.a=t2.b-2,t2.b,t2.b-1);
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
| 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 9 | |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | test.t1.a | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-------------+
How to repeat:
The following commands can demonstrate the problem:
CREATE TABLE t1 (a int PRIMARY KEY, b int);
INSERT INTO t1 VALUES
(1, 2), (2, 1), (3, 2), (4, 3), (5,6), (6,5), (7,8), (8,7), (9,10);
CREATE TABLE t2 (a int PRIMARY KEY, b int);
INSERT INTO t2 VALUES
(3,0), (4,1), (6,4), (7,5);
EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE t1.a IN (t2.a,t2.b);
EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE t1.a=t2.a OR t1.a=t2.b;
EXPLAIN SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a
WHERE t1.a > IF(t1.a=t2.b-2,t2.b,t2.b-1);
Suggested fix:
Correct the calculation of the not_null_tables attribute for the IN and NOT IN predicates and the IF functions.