MySQL Bugs: #47454: Not exists optimization method is not used for NULL columns in the right table

Bug #47454	Not exists optimization method is not used for NULL columns in the right table
Submitted:	19 Sep 2009 19:33	Modified:	20 Sep 2009 7:24
Reporter:	Alex Bolenok	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.1.35	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	left join, not exists, Optimizer

Description:
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

> SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id
>   WHERE t2.id IS NULL;

> Assume that t2.id is defined as NOT NULL.

This assumption is too strong.

Indeed, even if t2.id is defined as NULL, no actual row with a NULL value or t2.id can be returned as a result of the join, since a NULL value can never satisfy the equality.

If a NULL value of t2.id is returned from this query, it is guaranteed to be that of a fake row generated by the LEFT JOIN, not that of an actual row with a NULL value in t2.

This condition should be rewritten as "t2.id is defined as NOT NULL or participated in the JOIN"

How to repeat:
CREATE TABLE t_left (
        id INT NOT NULL PRIMARY KEY,
        value INT,
        KEY ix_left_value (value)
        );

CREATE TABLE t_right (
        id INT NOT NULL PRIMARY KEY,
        nvalue INT,
        nnvalue INT NOT NULL,
        KEY ix_right_nvalue (nvalue),
        KEY ix_right_nnvalue (nnvalue)
        );

INSERT
INTO    t_left
VALUES
(1, 1),
(2, 2),
(3, 3),
(4, NULL);

INSERT
INTO    t_right
VALUES
(1, 1, 1),
(3, 3, 3),
(4, NULL, 4);

EXPLAIN
SELECT  l.*
FROM    t_left l
LEFT JOIN
        t_right r
ON      l.value = r.nnvalue
WHERE   r.nnvalue IS NULL;

1, 'SIMPLE', 'l', 'index', '', 'ix_left_value', '5', '', 4, 'Using index'
1, 'SIMPLE', 'r', 'ref', 'ix_right_nnvalue', 'ix_right_nnvalue', '4', 'test.l.value', 1, 'Using where; Using index; Not exists'

EXPLAIN
SELECT  l.*
FROM    t_left l
LEFT JOIN
        t_right r
ON      l.value = r.nvalue
WHERE   r.nvalue IS NULL;

1, 'SIMPLE', 'l', 'index', '', 'ix_left_value', '5', '', 4, 'Using index'
1, 'SIMPLE', 'r', 'ref', 'ix_right_nvalue', 'ix_right_nvalue', '5', 'test.l.value', 1, 'Using where; Using index'

I expected the second query use "Not exists" method just as well as the first query does.

Suggested fix:
Rewrite the optimizer algorithm so that it uses NOT EXISTS method if IS NULL condition is checked for a nullable column in the right table and the aforementioned column is a part of equality condition in the ON clause of the LEFT JOIN.

Thank you for the feature request.