Bug #48177 SELECTs with NOT IN subqueries containing NULL values return too many records
Submitted: 20 Oct 2009 9:02 Modified: 18 Dec 2009 13:18
Reporter: Jørgen Løland Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0.88, 5.1, 6.0 OS:Any
Assigned to: Jørgen Løland CPU Architecture:Any

[20 Oct 2009 9:02] Jørgen Løland
Description:
Queries that compare more than two columns to a "NOT IN subselect" return <Unknown> rows if one of the outer or inner column values are NULL. According to the SQL standard, <Unknown> values should not be in the result set.

SQL 2003 has these relevant sections, defining that <Unknown> rows should not be in the result set:

WHERE clause:
-------------
General Rules
1) Let T be the result of the preceding <from clause>.
2) The <search condition> is applied to each row of T. The result
   of the <where clause> is a table of those rows of T for which
   the result of the <search condition> is True.

SEARCH condition:
-----------------
Specify a condition that is True, False, or Unknown, depending on
the value of a <boolean value expression>.

BOOLEAN VALUE expression:
-------------------------
* NOT (Unknown) is Unknown.

How to repeat:
CREATE TABLE t1 (
pk int NOT NULL,
i1_key int DEFAULT NULL,
i1_nokey int DEFAULT NULL,
i2_nokey int DEFAULT NULL,
v varchar(1) DEFAULT NULL,
PRIMARY KEY (pk),
KEY (i1_key)
) ;

INSERT INTO t1 VALUES (3, NULL,  NULL, 1, 'a'),
(4, 3,     3,    2, 'b'),
(6, 5,     5,    3, 'c'),
(7, NULL,  NULL, 4, 'b'),
(8, 1,     1,    5, 'd'),
(9, 2,     2,    6, 'e');

CREATE TABLE t2 (
pk int NOT NULL,
i1_key int DEFAULT NULL,
i2_key int DEFAULT NULL,
i2_nokey int DEFAULT NULL,
PRIMARY KEY (pk),
KEY (i1_key),
KEY (i2_key)
) ;

INSERT INTO t2 VALUES (1, NULL, 3, 3),
(2, NULL, 0, 0),
(5, NULL, 4, 4),
(7, 0,    5, 5),
(8, 2,    6, 6),
(9, 8,    3, 3);

# Data in t1
SELECT i2_nokey, i1_nokey, i1_key, v FROM t1 ORDER BY pk;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
3	5	5	c
4	NULL	NULL	b
5	1	1	d
6	2	2	e

# Data in subquery (should be filtered out)
SELECT i2_nokey, i1_key FROM t2 ORDER BY i2_nokey;
i2_nokey	i1_key
0	NULL
3	NULL
3	8
4	NULL
5	0
6	2

# Actual result
SELECT i2_nokey, i1_nokey, i1_key, v
FROM t1
WHERE (i1_nokey, i2_nokey) 
NOT IN (SELECT i1_key, i2_nokey FROM t2)
ORDER BY pk;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
4	NULL	NULL	b
5	1	1	d

# In the above query, the record [4,NULL,NULL,b] should evaluate to 
# <Unknown> (and thus not be returned) due to record [4,NULL] in the 
# subquery.

# Moving the t1.i2_nokey=t2.i2_nokey comparison inside the subselect 
# removes the bug, so single column subselect comparisons seem to work:

SELECT i2_nokey, i1_nokey, i1_key, v
FROM t1
WHERE (i1_nokey) 
NOT IN (SELECT i1_key FROM t2 WHERE t1.i2_nokey=t2.i2_nokey)
ORDER BY pk;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
5	1	1	d

Suggested fix:
Do not return <Unknown> records when processing multi-column NOT IN subqueries.
[20 Oct 2009 9:06] Jørgen Løland
This bug is probably the root problem of BUG#47694
[20 Oct 2009 9:14] Valeriy Kravchuk
Verified just as described, also - with 5.0.88:

77-52-242-160:5.0 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.0.88-debug Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> drop table t1;
Query OK, 0 rows affected (0.08 sec)

mysql> drop table t2;
Query OK, 0 rows affected (0.03 sec)

mysql> CREATE TABLE t1 (
    -> pk int NOT NULL,
    -> i1_key int DEFAULT NULL,
    -> i1_nokey int DEFAULT NULL,
    -> i2_nokey int DEFAULT NULL,
    -> v varchar(1) DEFAULT NULL,
    -> PRIMARY KEY (pk),
    -> KEY (i1_key)
    -> ) ;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> INSERT INTO t1 VALUES (3, NULL,  NULL, 1, 'a'),
    -> (4, 3,     3,    2, 'b'),
    -> (6, 5,     5,    3, 'c'),
    -> (7, NULL,  NULL, 4, 'b'),
    -> (8, 1,     1,    5, 'd'),
    -> (9, 2,     2,    6, 'e');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> 
mysql> CREATE TABLE t2 (
    -> pk int NOT NULL,
    -> i1_key int DEFAULT NULL,
    -> i2_key int DEFAULT NULL,
    -> i2_nokey int DEFAULT NULL,
    -> PRIMARY KEY (pk),
    -> KEY (i1_key),
    -> KEY (i2_key)
    -> ) ;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> INSERT INTO t2 VALUES (1, NULL, 3, 3),
    -> (2, NULL, 0, 0),
    -> (5, NULL, 4, 4),
    -> (7, 0,    5, 5),
    -> (8, 2,    6, 6),
    -> (9, 8,    3, 3);
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> 
mysql> SELECT i2_nokey, i1_nokey, i1_key, v
    -> FROM t1
    -> WHERE (i1_nokey, i2_nokey) 
    -> NOT IN (SELECT i1_key, i2_nokey FROM t2)
    -> ORDER BY pk;
+----------+----------+--------+------+
| i2_nokey | i1_nokey | i1_key | v    |
+----------+----------+--------+------+
|        1 |     NULL |   NULL | a    | 
|        2 |        3 |      3 | b    | 
|        4 |     NULL |   NULL | b    | 
|        5 |        1 |      1 | d    | 
+----------+----------+--------+------+
4 rows in set (0.02 sec)

mysql> SELECT i2_nokey, i1_nokey, i1_key, v
    -> FROM t1
    -> WHERE (i1_nokey) 
    -> NOT IN (SELECT i1_key FROM t2 WHERE t1.i2_nokey=t2.i2_nokey)
    -> ORDER BY pk;
+----------+----------+--------+------+
| i2_nokey | i1_nokey | i1_key | v    |
+----------+----------+--------+------+
|        1 |     NULL |   NULL | a    | 
|        2 |        3 |      3 | b    | 
|        5 |        1 |      1 | d    | 
+----------+----------+--------+------+
3 rows in set (0.00 sec)
[21 Oct 2009 13:38] Jørgen Løland
Simpler-to-read explain output from mysql-6.0-codebase with semijoin and materialization off:

EXPLAIN EXTENDED
SELECT i2_nokey, i1_nokey, i1_key, v
FROM t1
WHERE (i1_nokey, i2_nokey) 
NOT IN (SELECT i1_key, i2_nokey FROM t2);
id sel_type table type     pos_keys key    ref  rows fltr Extra
1  PRIMARY  t1    ALL      NULL     NULL   NULL 6    100  Using where
2  DEP SQ   t2    idx_subq i1_key   i1_key func 2    100  Using where; Full scan on NULL key
Warnings:
Note	1003	
SELECT `i2_nokey`, `i1_nokey`, `i1_key`, `v`
FROM `t1` 
WHERE (
  NOT(
    <in_optimizer>((`t1`.`i1_nokey`,`t1`.`i2_nokey`),
      <exists>(
        <index_lookup>(
           <cache>(`t1`.`i1_nokey`) IN t2 on i1_key checking NULL 
           where (
             trigcond(
               ((<cache>(`t1`.`i1_nokey`) = `t2`.`i1_key`) or isnull(`t2`.`i1_key`)))
             and 
             trigcond(
               ((<cache>(`t1`.`i2_nokey`) = `t2`.`i2_nokey`) or isnull(`t2`.`i2_nokey`)))
             )
           having (
             trigcond(
               <is_not_null_test>(`t2`.`i1_key`)) 
             and 
             trigcond(
               <is_not_null_test>(`t2`.`i2_nokey`)))))))
      )
[21 Oct 2009 14:01] Jørgen Løland
In another record is inserted into t1 like this:

# Data in t1
SELECT i2_nokey, i1_nokey, i1_key, v FROM t1 ORDER BY pk;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
3	5	5	c
4	NULL	NULL	b
4       0       0       b           <-----
5	1	1	d
6	2	2	e

The new record is correctly filtered out (not returned). It seems like the problem is only present for records that have a NULL value on the left hand side of IN.
[22 Oct 2009 9:35] Jørgen Løland
Another strange observation:

The bug is only repeatable if there is another record in t1 with a NULL value that qualifies for the result set:

(1) Removing record [1,NULL,NULL,a] from t1 makes the incorrectly returned 
    [4,NULL,NULL,b] record go away. 
(2) Inserting record [1,1] into t2 removes *both* the [1,NULL...] and 
    [4,NULL...] records from the result of the query.

In other words:

Data in t1
SELECT i2_nokey, i1_nokey, i1_key, v FROM t1 ORDER BY i2_nokey, pk;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a    <----- (1) Delete this record
2	3	3	b
3	5	5	c
4	NULL	NULL	b
5	1	1	d

or

Data in subquery (should be filtered out)
SELECT i2_nokey, i1_key FROM t2 ORDER BY i2_nokey;
i2_nokey	i1_key
0	NULL
1	1            <------ (2) Insert this record
3	NULL
3	8
4	NULL
5	0
6	2
[22 Oct 2009 12:02] Jørgen Løland
Reduced the test to:

Data in t1
SELECT i2_nokey, i1_nokey, i1_key, v FROM t1;
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
4	NULL	NULL	b
4	0	0	b

Data in subquery (should be filtered out)
SELECT i2_nokey, i1_key FROM t2 ORDER BY i2_nokey;
i2_nokey	i1_key
4	NULL

SELECT i2_nokey, i1_nokey, i1_key, v
FROM t1
WHERE (i1_nokey, i2_nokey) 
NOT IN (SELECT i1_key, i2_nokey FROM t2);
i2_nokey	i1_nokey	i1_key	v
1	NULL	NULL	a
2	3	3	b
4	NULL	NULL	b      <------ WRONG

However, if we change the order records are *inserted* into t1, neither [1,NULL,...] or [4,NULL,...] appear in the result set:

Data in t1
SELECT i2_nokey, i1_nokey, i1_key, v FROM t1;
i2_nokey	i1_nokey	i1_key	v
4	NULL	NULL	b
1	NULL	NULL	a
2	3	3	b
4	0	0	b

Data in subquery (should be filtered out)
SELECT i2_nokey, i1_key FROM t2 ORDER BY i2_nokey;
i2_nokey	i1_key
4	NULL

Actual result
SELECT i2_nokey, i1_nokey, i1_key, v
FROM t1
WHERE (i1_nokey, i2_nokey) 
NOT IN (SELECT i1_key, i2_nokey FROM t2);
i2_nokey	i1_nokey	i1_key	v
2	3	3	b
                              <----- WRONG - [1,NULL,...] should be here!
[26 Oct 2009 11:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/88133

3180 Jorgen Loland	2009-10-26
      Bug#48177 - SELECTs with NOT IN subqueries containing NULL 
                  values return too many records
      
      WHERE clauses with "outer_value_list NOT IN subselect" were
      handled incorrectly if the outer value list contained multiple 
      items where at least one of these could be NULL. The first 
      outer record with NULL value was handled correctly, but if a 
      second record with NULL value existed, the optimizer would 
      choose to reuse the result it got on the last execution of the 
      subselect. This is incorrect if the outer value list has 
      multiple items.
      
      The fix is to make Item_in_optimizer::val_int (in 
      item_cmpfunc.cc) reuse the result of the latest execution
      for NULL values only if all values in the outer_value_list 
      are NULL.
     @ mysql-test/r/subselect3.result
        Added test for BUG#48177
     @ mysql-test/t/subselect3.test
        Added test for BUG#48177
     @ sql/item_cmpfunc.cc
        Make Item_in_optimizer::val_int (in item_cmpfunc.cc) reuse the result of the latest execution for NULL values only if all values in the outer_value_list are NULL.
[30 Oct 2009 12:10] Øystein Grøvlen
Approved with minor comments.
[3 Nov 2009 12:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/89127

3162 Jorgen Loland	2009-11-03
      Bug#48177 - SELECTs with NOT IN subqueries containing NULL 
                  values return too many records
      
      WHERE clauses with "outer_value_list NOT IN subselect" were
      handled incorrectly if the outer value list contained multiple 
      items where at least one of these could be NULL. The first 
      outer record with NULL value was handled correctly, but if a 
      second record with NULL value existed, the optimizer would 
      choose to reuse the result it got on the last execution of the 
      subselect. This is incorrect if the outer value list has 
      multiple items.
           
      The fix is to make Item_in_optimizer::val_int (in 
      item_cmpfunc.cc) reuse the result of the latest execution
      for NULL values only if all values in the outer_value_list 
      are NULL.
     @ mysql-test/r/subselect3.result
        Added test for BUG#48177
     @ mysql-test/t/subselect3.test
        Added test for BUG#48177
     @ sql/item_cmpfunc.cc
        Make Item_in_optimizer::val_int (in item_cmpfunc.cc) reuse the result of the latest execution for NULL values only if all values in the outer_value_list are NULL.
[4 Nov 2009 9:24] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091104092152-qz96bzlf2o1japwc) (version source revid:kristofer.pettersson@sun.com-20091103162305-08l4gkeuif2ozsoj) (merge vers: 5.1.41) (pib:13)
[11 Nov 2009 6:49] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091110093407-rw5g8dys2baqkt67) (version source revid:alik@sun.com-20091109080109-7dxapd5y5pxlu08w) (merge vers: 6.0.14-alpha) (pib:13)
[11 Nov 2009 6:56] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091109115615-nuohp02h8mdrz8m2) (version source revid:alik@sun.com-20091105121316-hgdduu5vqdpbawf8) (merge vers: 5.5.0-beta) (pib:13)
[18 Nov 2009 16:42] Paul DuBois
Noted in 5.1.41, 5.5.0, 6.0.14 changelogs.

WHERE clauses with "outer_value_list NOT IN subquery" were handled
incorrectly if the outer value list contained multiple items at least
one of which could be NULL.
[18 Dec 2009 10:30] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:45] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:01] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:15] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[18 Dec 2009 13:18] MC Brown
Already documented in 5.1.41