Description:
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
> SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id
> WHERE t2.id IS NULL;
> Assume that t2.id is defined as NOT NULL.
This assumption is too strong.
Indeed, even if t2.id is defined as NULL, no actual row with a NULL value or t2.id can be returned as a result of the join, since a NULL value can never satisfy the equality.
If a NULL value of t2.id is returned from this query, it is guaranteed to be that of a fake row generated by the LEFT JOIN, not that of an actual row with a NULL value in t2.
This condition should be rewritten as "t2.id is defined as NOT NULL or participated in the JOIN"
How to repeat:
CREATE TABLE t_left (
id INT NOT NULL PRIMARY KEY,
value INT,
KEY ix_left_value (value)
);
CREATE TABLE t_right (
id INT NOT NULL PRIMARY KEY,
nvalue INT,
nnvalue INT NOT NULL,
KEY ix_right_nvalue (nvalue),
KEY ix_right_nnvalue (nnvalue)
);
INSERT
INTO t_left
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, NULL);
INSERT
INTO t_right
VALUES
(1, 1, 1),
(3, 3, 3),
(4, NULL, 4);
EXPLAIN
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON l.value = r.nnvalue
WHERE r.nnvalue IS NULL;
1, 'SIMPLE', 'l', 'index', '', 'ix_left_value', '5', '', 4, 'Using index'
1, 'SIMPLE', 'r', 'ref', 'ix_right_nnvalue', 'ix_right_nnvalue', '4', 'test.l.value', 1, 'Using where; Using index; Not exists'
EXPLAIN
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON l.value = r.nvalue
WHERE r.nvalue IS NULL;
1, 'SIMPLE', 'l', 'index', '', 'ix_left_value', '5', '', 4, 'Using index'
1, 'SIMPLE', 'r', 'ref', 'ix_right_nvalue', 'ix_right_nvalue', '5', 'test.l.value', 1, 'Using where; Using index'
I expected the second query use "Not exists" method just as well as the first query does.
Suggested fix:
Rewrite the optimizer algorithm so that it uses NOT EXISTS method if IS NULL condition is checked for a nullable column in the right table and the aforementioned column is a part of equality condition in the ON clause of the LEFT JOIN.