Bug #56881 NULL left operand to NOT IN in WHERE clause behaves differently than real NULL
Submitted: 20 Sep 2010 21:17 Modified: 23 May 2011 17:49
Reporter: Roland Bouman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1 OS:Any (ubuntu linux, ms windows xp)
Assigned to: CPU Architecture:Any

[20 Sep 2010 21:17] Roland Bouman
Description:
An expression of the form 

<column-reference> NOT IN (<query-returning empty set)

evaluates to FALSE in case it appears in the WHERE clause, and the <column-reference> is from an outer joined table for which the matched row does not exist.

However, if the same expression appears in the SELECT list, it evaluates to TRUE, and it also evaluates to TRUE if the expression appears in the WHERE clause, but <column-reference> is from an inner or outer joined table for which the matched row does exist and the column just happens to have an explicit NULL value.

(NOTE: it is not really clear to me what the behaviour should be in expressions of the form

NULL NOT IN (<empty-set>)

One the one hand, one could argue that the value of NULL is unknown, hence we cannot know whether or not it appears in any set, in which case the expression would be NULL (which would be coerced to FALSE).
On the other hand, we could argue that since the set on the right hand side is empty, we know for sure it contains no elements at all, and we need not know even know the value on the left hand side to conclude that it cannot be contained in the set, in which case the expression would be TRUE.

FWIW, Postgres and Oracle seem to favor the last interpretation 
)

How to repeat:
CREATE TABLE parent (id int);
INSERT INTO parent VALUES (1), (2);

CREATE TABLE child (parent_id int, other int);
INSERT INTO child VALUES (1,NULL);

-- demonstrate that "NULL NOT IN empty set" is considered TRUE
SELECT    p.id, c.parent_id,
          c.parent_id NOT IN (
              SELECT parent_id 
              FROM   child
              WHERE  parent_id = 3
          ) notin
FROM      parent p
LEFT JOIN child  c
ON        p.id = c.parent_id

+------+-----------+-------+
| id   | parent_id | notin |
+------+-----------+-------+
|    1 |         1 |     1 |
|    2 |      NULL |     1 |
+------+-----------+-------+

(note the second row, where NULL is considered to be NOT IN the empty set)

-- demonstrate that "NULL NOT IN empty set" is considered FALSE in the WHERE
SELECT    p.id, c.parent_id
FROM      parent p
LEFT JOIN child  c
ON        p.id = c.parent_id
WHERE     c.parent_id NOT IN (
              SELECT parent_id 
              FROM   child
              WHERE  parent_id = 3
          )

+------+-----------+
| id   | parent_id |
+------+-----------+
|    1 |         1 |
+------+-----------+

(note there is no row for p.id = 2. Removing the WHERE clause does show that row. So, apparently, "NULL NOT IN empty set" is considered FALSE in that case.

-- Finally, demonstrate there is a difference between a "real" NULL 
-- and the one generated by the outer join

SELECT    p.id, c.parent_id, c.other
FROM      parent p
LEFT JOIN child  c
ON        p.id = c.parent_id
WHERE     c.other NOT IN (
              SELECT parent_id 
              FROM   child
              WHERE  parent_id = 3
          )

+------+-----------+-------+
| id   | parent_id | other |
+------+-----------+-------+
|    1 |         1 |  NULL |
+------+-----------+-------+

(note that although other is NULL, it does not prevent the NOT IN expression from evaluating to TRUE. However, for parent.id = 2, the generated NULL record is treated differently and does not appear in the result

Suggested fix:
Make the NOT IN behavior consistent between WHERE and SELECT. Probably the behavior as seen in Oracle and Postgres should be copied.
[20 Sep 2010 21:27] Roland Bouman
(added version info, os-es, and severity. Severity is serious as the query results are affected)
[21 Sep 2010 3:20] Valeriy Kravchuk
Verified with current mysql-5.1 from bzr:

macbook-pro:5.1 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.1.51-debug Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE parent (id int);
Query OK, 0 rows affected (0.39 sec)

mysql> INSERT INTO parent VALUES (1), (2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE TABLE child (parent_id int, other int);
Query OK, 0 rows affected (0.20 sec)

mysql> INSERT INTO child VALUES (1,NULL);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT    p.id, c.parent_id,
    ->           c.parent_id NOT IN (
    ->               SELECT parent_id 
    ->               FROM   child
    ->               WHERE  parent_id = 3
    ->           ) notin
    -> FROM      parent p
    -> LEFT JOIN child  c
    -> ON        p.id = c.parent_id
    -> ;
+------+-----------+-------+
| id   | parent_id | notin |
+------+-----------+-------+
|    1 |         1 |     1 |
|    2 |      NULL |     1 |
+------+-----------+-------+
2 rows in set (0.02 sec)

mysql> SELECT    p.id, c.parent_id
    -> FROM      parent p
    -> LEFT JOIN child  c
    -> ON        p.id = c.parent_id
    -> WHERE     c.parent_id NOT IN (
    ->               SELECT parent_id 
    ->               FROM   child
    ->               WHERE  parent_id = 3
    ->           )
    -> ;
+------+-----------+
| id   | parent_id |
+------+-----------+
|    1 |         1 |
+------+-----------+
1 row in set (0.01 sec)

mysql> SELECT    p.id, c.parent_id, c.other
    -> FROM      parent p
    -> LEFT JOIN child  c
    -> ON        p.id = c.parent_id
    -> WHERE     c.other NOT IN (
    ->               SELECT parent_id 
    ->               FROM   child
    ->               WHERE  parent_id = 3
    ->           )
    -> ;
+------+-----------+-------+
| id   | parent_id | other |
+------+-----------+-------+
|    1 |         1 |  NULL |
+------+-----------+-------+
1 row in set (0.01 sec)
[23 May 2011 17:50] Paul DuBois
Noted in 5.6.3 changelog.

For an outer join with a NOT IN subquery in the WHERE clause, a null
left operand to the NOT IN returned was treated differently than a 
literal NULL operand. 

CHANGESET - http://lists.mysql.com/commits/136714