Bug #8804 wrong result for [null_expr] IN (SELECT ...)
Submitted: 25 Feb 2005 10:47 Modified: 11 Dec 2006 8:34
Reporter: Sven Hagemann Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:Server version: 4.1.9-Max/4.1.15 BK source OS:Linux (Linux)
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: NOT NULL IN (SELECT ...), NULL IN (SELECT ...), subqueries

[25 Feb 2005 10:47] Sven Hagemann
Description:
Hello,
my select:

select titleID,title,publID from titles where publID not in (select publID from publishers);

returns to me only at the max. two data records, although it a whole number
more affected.
If I delete those data records found (one or two) and lead that one
Select again from, then he finds the next ones (one or two)
Data records.
All at the same time finds it, however, not.

My SQL dump I send to them.

Greeting
Sven

How to repeat:
?
[25 Feb 2005 10:51] Sven Hagemann
Example-DB

Attachment: mylibrary.sql (application/octet-stream, text), 11.26 KiB.

[11 Jul 2005 4:19] Aleksey Kishkin
Tested on slackware against 4.1.12 and 5.0.7
authors query:

select titleID,title,publID from titles where publID not in (select publID from publishers);
+---------+----------+--------+
| titleID | title    | publID |
+---------+----------+--------+
|      45 | deleteme |     99 |
+---------+----------+--------+
1 row in set (0.00 sec)

Rewritten query that must return the same recordset:

mysql> select titleID,title,titles.publID from titles left join publishers using(publID) where publishers.publID is NULL;
+---------+---------------------------------+--------+
| titleID | title                           | publID |
+---------+---------------------------------+--------+
|      20 | Jag saknar dig, jag saknar dig  |   NULL |
|      18 | Nennen wir ihn Anna             |   NULL |
|      19 | Alltid den där Annette          |   NULL |
|      24 | VBA-Programmierung mit Excel 7  |   NULL |
|      25 | Linux für Internet und Intranet |   NULL |
|      32 | Ute av verden                   |   NULL |
|      41 | PHP 4                           |   NULL |
|      44 | MySQL                           |   NULL |
|      45 | deleteme                        |     99 |
+---------+---------------------------------+--------+
9 rows in set (0.00 sec)
[11 Jul 2005 4:38] Aleksey Kishkin
simplified date for testcase

Attachment: mylib.sql (text/x-sql), 1.80 KiB.

[29 Jul 2006 11:12] Heikki Tuuri
This bug may be a duplicate of http://bugs.mysql.com/bug.php?id=21077
[14 Aug 2006 15:36] Sergey Petrunya
The above statement that provides "Rewritten query that must return the same recordset" is not correct. The rewritten query does/should not return the same recordset.
[14 Aug 2006 15:41] Sergey Petrunya
In a talk with Evgeny we've concluded that 
 - this bug is about subqueries and NULLs handling
 - BUG#21077 is about index scans not being restarted correctly,
i.e. those two are unrelated.
[25 Sep 2006 16:08] Sergey Petrunya
From discussion with Monty:

This error is documented in the user manual. Nevertheless, it should be fixed in version 5.1.

The fix for this bug will be produced within the scope of works on subquery optimization.
[10 Oct 2006 12:20] Sergey Petrunya
Analysis:

The bug is caused by the index-lookup subquery execution 'engine'.
The engine handles subqueries in form

  lexpr IN (SELECT tbl.key FROM tbl WHERE some_cond )

by doing index lookups via tbl.key. This is correct for all cases except for
when lexpr has a value of NULL. In that case, the value of suquery predicate 
should be

 (SELECT produces empty rows set)? FALSE : NULL
[10 Oct 2006 12:21] Sergey Petrunya
Fix:
Igor, Evgen and me had several discussions about the bug and come up with
this fix scheme:

 * The applicablity of index-lookup engine will remain unchanged.

 * If subqery_predicate is a "top level item", then current algorithm will
   be used (as it is correct)

 * Otherwise, when invoked with "lexpr IS NULL", index-lookup engine will 
   perform a full table scan to determine if the SELECT will produce empty
   row set or not.

 * If the SELECT is not correlated, then the value of "NULL IN (SELECT ...)"
   will be cached.
[10 Oct 2006 12:21] Sergey Petrunya
The above described fix will be applied to 4.1
[13 Oct 2006 16:29] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13670

ChangeSet@1.2525, 2006-10-13 20:25:39+04:00, evgen@moonbone.local +6 -0
  Bug#8804: Wrong result of NULL IN (subselect with index)
  
  If left expression of the IN (subselect with index) predicate is NULL then
  IN() was always returning NULL independently of the result of the subquery.
  
  Now subselect_indexsubquery_engine and subselect_uniquesubquery_engine engines
  checks whether the subquery has an empty result set when the ref key conatins a
  NULL part. Depending on the check result the Item_in_optimizer will return
  NULL or FALSE values to the caller.
[19 Oct 2006 13:08] Sergey Petrunya
Current 4.1 code can also produce incorrect results for SELECTs that are executed w/o using indexes.
Here is a testcase (provided by Evgen): 

create table t1(f1 int, f11 int);
insert into t1 values (null, null), (null, 0), (null, 3), (0, 0), (1, 1); 
create table t5(f5 int, f55 int);
insert into t5 values (0, 0), (2, 2);
select *, f1 not in (select f5 from t5) sub from t1; 

Current 4.1 produces:
mysql>  select *, f1 not in (select f5 from t5 where f55=f11) sub from t1;
+------+------+------+
| f1   | f11  | sub  |
+------+------+------+
| NULL | NULL | NULL | 
| NULL |    0 | NULL | 
| NULL |    3 | NULL | 
|    0 |    0 |    0 | 
|    1 |    1 |    1 | 
+------+------+------+

while the correct result is:
select *, f1 not in (select f5 from t5 where f55=f11) sub from t1;
f1    f11     sub
NULL  NULL    1
NULL  0       NULL
NULL  3       1
0     0       0
1     1       1

i.e. the value of 
  NULL NOT IN ({empty select}) 
is caclulated to be NULL while the correct result is TRUE.
[19 Oct 2006 13:25] Sergey Petrunya
One more related problem: 
Consider a testcase:

create table t10 (a int);
insert into t10 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table t11 (a int, key(a));
insert into t11 select 1 + 2*(A.a + 10 * B.a) from t10 A, t10 B;
insert into t11 values (NULL);
# Ok, t11 has NULL, 1, 3, 5, ...

create table three (a int);
insert into three values (1),(2),(3);

explain select a,a in (select a from t11) from three;
+----+--------------------+-------+----------------+---------------+------+---------+------+------+-------------+
| id | select_type        | table | type           | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------------+-------+----------------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY            | three | ALL            | NULL          | NULL |    NULL | NULL |    3 |             | 
|  2 | DEPENDENT SUBQUERY | t11   | index_subquery | a             | a    |       5 | func |    2 | Using index | 
+----+--------------------+-------+----------------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)

The query result will be incorrect:
select a,a in (select a from t11) from three;
+------+--------------------------+
| a    | a in (select a from t11) |
+------+--------------------------+
|    1 |                        1 | 
|    2 |                     NULL | 
|    3 |                     NULL | 
+------+--------------------------+
3 rows in set (0.00 sec)

select a,a in (select a from t11) from three order by a desc;
+------+--------------------------+
| a    | a in (select a from t11) |
+------+--------------------------+
|    3 |                        1 | 
|    2 |                     NULL | 
|    1 |                     NULL | 
+------+--------------------------+
3 rows in set (0.00 sec)

If a subuqery engine doesn't use index, the result will be correct:
create table t11n as select * from t11;
select a,a in (select a from t11n) from three;
+------+---------------------------+
| a    | a in (select a from t11n) |
+------+---------------------------+
|    1 |                         1 | 
|    2 |                      NULL | 
|    3 |                         1 | 
+------+---------------------------+
3 rows in set (0.00 sec)
[19 Oct 2006 13:53] Sergey Petrunya
Analysis: (version 2) 
The bug is caused by two probems:

Problem1:
For subqueries in form
   oe IN (SELECT ... FROM tbl WHERE some_cond )

Both index lookup-based and direct execution engines produce incorrect 
results (FALSE<->NULL) when oe IS NULL, and SELECT produces an empty row
set.

Problem2:
index-lookup based execution engine may produce incorrect result when 
oe IS NOT NULL, and SELECT produces a set without matching value and a NULL.
[19 Oct 2006 19:57] Sergey Petrunya
Problem#2 is filed as a separate BUG#23478.
[19 Oct 2006 23:47] Sergey Petrunya
Problem2 will be addressed separately. It is filed as BUG#23478.
[20 Oct 2006 1:06] Sergey Petrunya
Problem #1 occurs in both index_subquery execution engine and direct evaluation engine. Fixes will be required for both engines, they are likely not to have any intersection.

[unique_]index_subquery engine will be fixed as a part of this bug
Direct evaluation engine will be fixed as part of BUG#23485.

(I'm distributing work into several bugs as it will be done by different people).
[25 Oct 2006 11:37] Sergey Petrunya
One more testcase:

create table three (a int, b int);
insert into three values (1,1),(2,2),(3,3),(NULL,NULL);

create table tt11 (
  a int, b int, c int,
  key(a,b)
);
insert into tt11 select 2*(A.a + 10*B.a),0,0 from ten A, ten B;
update tt11 set b=a,c=a;
insert into tt11 values (NULL, NULL, NULL);

mysql>  select a, ((a,a) in (select tt11.a, tt11.b from tt11 where tt11.c = three.b )) SQ from three;
+------+------+
| a    | SQ   |
+------+------+
|    1 |    1 | 
|    2 |    0 | 
|    3 |    1 | 
| NULL | NULL | 
+------+------+
4 rows in set (3.73 sec)

mysql> explain  select a, ((a,a) in (select tt11.a, tt11.b from tt11 where tt11.c = three.b )) SQ from three;
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
| id | select_type        | table | type        | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY            | three | ALL         | NULL          | NULL |    NULL | NULL |    4 |             | 
|  2 | DEPENDENT SUBQUERY | tt11  | ref_or_null | a             | a    |       5 | func |    2 | Using where | 
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
2 rows in set (8.47 sec)
[25 Oct 2006 11:38] Sergey Petrunya
Analysis (version 3):
=====================

Base settings
-------------
There are 3 different ways to execute a subquery:

 1. "Detachable naive execution". A subquery is executed as if it has been a
    standalone query (references to outer parts being unknown constants).
    Predicates that were pushed down into the subquery are only used for
    filtering at the last stage. This means that subquery's QEP doesn't
    depend on them, we can easily turn them on an off.

 2. [unique]_index_subquery execution.
    Subquery is a "simple one-table select", its execution is performed by
    an index lookup, with possible filtering.

 3. "Undetachable naive execution". One of predicates pushed into the
    subquery is used for table access, and/or participates in QEP in a
    non-trivial way. We can't easily turn the pushed-down predicates on and
    off. 
    Non-trivial examples of this case involve use of 'ref_or_null' access.

Original underlying problem of this bug
---------------------------------------
The problem is that value of  "NULL IN (SELECT ...)" is not correct, for all
three cases.

Available solutions
-------------------
 Case #1: psergey's commit for BUG#23485. Based on Evgen and Igor's idea
          with Item_func_trig_cond.

 Case #2: Evgen's design from SPB meeting.

 Case #3: Nothing yet, will post proposal here shortly
[25 Oct 2006 11:40] Sergey Petrunya
An interesting observation:
On tables specified in comment dated [25 Oct 13:37] Sergey Petrunia, run

mysql> explain  select a, ((a,a) in (select tt11.a, tt11.b from tt11 where tt11.c = three.b )) SQ from three;
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
| id | select_type        | table | type        | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
|  1 | PRIMARY            | three | ALL         | NULL          | NULL |    NULL | NULL |    4 |             | 
|  2 | DEPENDENT SUBQUERY | tt11  | ref_or_null | a             | a    |       5 | func |    2 | Using where | 
+----+--------------------+-------+-------------+---------------+------+---------+------+------+-------------+
2 rows in set (8.47 sec)

mysql> explain select a from three where (a,a) in (select tt11.a, tt11.b from tt11 where tt11.c = three.b) ; 
+----+--------------------+-------+----------------+---------------+------+---------+-----------+------+--------------------------+
| id | select_type        | table | type           | possible_keys | key  | key_len | ref       | rows | Extra                    |
+----+--------------------+-------+----------------+---------------+------+---------+-----------+------+--------------------------+
|  1 | PRIMARY            | three | ALL            | NULL          | NULL |    NULL | NULL      |    4 | Using where              | 
|  2 | DEPENDENT SUBQUERY | tt11  | index_subquery | a             | a    |      10 | func,func |    1 | Using index; Using where | 
+----+--------------------+-------+----------------+---------------+------+---------+-----------+------+--------------------------+

The only difference is that the in the first query the subquery is not 
top-level, and in the second query it is. Why does the current code choose
different methods? Neither of them can handle "NULL IN (SELECT ...)" 
correctly ATM.
[26 Oct 2006 20:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14451

ChangeSet@1.2534, 2006-10-27 00:49:23+04:00, sergefp@mysql.com +9 -0
  BUG#8804: wrong result for "NULL IN (SELECT ...)" 
  Add two execution strategies for "NULL IN (SELECT ...)" : 
   = replace index lookups with ref table scans
   = replace pushed-down-predicate-based ref accesses with full table scans.
[31 Oct 2006 17:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14623

ChangeSet@1.2290, 2006-10-31 20:51:09+03:00, sergefp@mysql.com +13 -0
  BUG#8804: wrong results for NULL IN (SELECT ...)
  Evaluate "NULL IN (SELECT ...)" in a special way: Disable pushed-down 
  conditions and their "consequences": 
   = Do full table scans instead of unique_[index_subquery] lookups.
   = Change appropriate "ref_or_null" accesses to full table scans in
     subquery's joins.
  Also cache value of NULL IN (SELECT ...) if the SELECT is not correlated 
  wrt any upper select.
[31 Oct 2006 18:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14630

ChangeSet@1.2291, 2006-10-31 21:30:40+03:00, sergefp@mysql.com +1 -0
  BUG#8804: Better comment + TODO section with suggestion how to speedup
  the fix.
[31 Oct 2006 21:04] Igor Babaev
Patch is to be applied only to 5.0->5.1.
[31 Oct 2006 21:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14636

ChangeSet@1.2292, 2006-11-01 00:27:51+03:00, sergefp@mysql.com +1 -0
  BUG#8804: Incorrect results for NULL IN (SELECT ...): review fixes:
  - Better comments
  - Remove redundant and dead code.
[27 Nov 2006 17:12] Georgi Kodinov
Pushed in 5.0.32/5.1.14-beta
[11 Dec 2006 8:34] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix for 5.0.32 and 5.1.14.

Updated synopsis and tags.
[20 Aug 2007 15:12] Jon Stephens
Removed item listing this issue as open in 5.0->5.2 versions of open-bugs page; noted fix for 5.0 and later in 4.1 version.