Bug #22855 Optimizer doesn't rewrite NOT IN subselects to a correlated subquery
Submitted: 30 Sep 2006 9:36 Modified: 18 Jun 2007 16:02
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.13, 4.1.22BK, 5.0.40-BK, 5.1.18-beta-BK OS:Linux (SUSE Linux 10.0, WinXP)
Assigned to: Georgi Kodinov CPU Architecture:Any

[30 Sep 2006 9:36] [ name withheld ]
Description:
There seems to be a problem with sub queries, signed ints and unsigned ints, and primary keys. It might be related to bug #10649 which had a complex reproduction scenario and wasn't solved.

How to repeat:
Here's a test program that shows the problem:

--
create table t1 ( course_id int(10) signed not null,   primary key (course_id) );
create table t2 ( course_id int(10) unsigned not null, primary key (course_id) );

insert into t1 values (1),(-1),(-2),(2),(3),(4),(5),(6),(7),(8),(65),(66),(67),(68);
insert into t2 values (65),(66),(67);

select distinct course_id from t1 where course_id not in (65,66,67);

drop table t1;
drop table t2;
--

Running on MySQL 4.1.13 on SUSE Linux, this doesn't print anything. It
should print those values in t1 but not t2. If I replace the subselect
with the result of the subselect (65,66,67) then it works as expected.
It also works if I remove the primary key from t2. If I just remove
the primary key from t1 it prints a somewhat mysterious '1'.

I posted this to the mysql mailing list and someone else confirmed the problem exists in 5.0.21 as well.

Suggested fix:
No fix but several workarounds. Removing the primary key is one. Someone else suggested making the signed/unsigned ints the same type solves the problem. Neither of these suit my application. Someone gave me a different query that doesn't use a subselect, and that works, but I don't understand the query so it's not ideal!
[30 Sep 2006 12:12] Tonci Grgin
Hi and thanks for your bug report. I was able to verify it on latest 4.1.22BK. Please use one of the provided (and documented) workarounds. Primary keys have nothing to with this. By dropping them you are directing optimizer on different execution path (table scan).

Your MySQL connection id is 1 to server version: 4.1.22-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t1 ( course_id int(10) not null,   primary key (course_id));
Query OK, 0 rows affected (0.03 sec)

mysql> create table t2 ( course_id int(10) unsigned not null, primary key (cours
e_id));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values (1),(-1),(-2),(2),(3),(4),(5),(6),(7),(8),(65),(66)
,(67),(68);
Query OK, 14 rows affected (0.00 sec)
Records: 14  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (65),(66),(67);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select distinct course_id from t1 where course_id not in (65,66,67);
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

mysql> select distinct course_id from t1 where course_id not in (SELECT  course_id FROM t2);
Empty set (0.00 sec)

Workaround 1 (rewriting as left join, preferred way):
mysql> select distinct t1.course_id from t1 left join t2 on t1.course_id=t2.course_id where t2.course_id is null;
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

Workaround 2 (working instead of optimizer, writing correlated subquery):
mysql> select distinct t1.course_id from t1
    ->   where not exists (
    ->     select t2.course_id from t2
    ->     where t2.course_id=t1.course_id
    ->   );
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

mysql>

Related manual pages:
http://dev.mysql.com/doc/refman/4.1/en/rewriting-subqueries.html
http://dev.mysql.com/doc/refman/4.1/en/subquery-restrictions.html
...
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
...
[11 Apr 2007 15:58] Georgi Kodinov
Recreated with 5.0.40-BK and 5.1.18-beta-BK.
I don't see the original problem anymore (i.e. the empty set whe using NOT IN (SELECT ...)).
But I see missing rows (the negative rows). Here's the reduced test case:
+ CREATE TABLE t1 (a INT NOT NULL);
+ INSERT INTO t1 VALUES (1),(-1), (65),(66);
+ CREATE TABLE t2 (a INT UNSIGNED NOT NULL PRIMARY KEY);
+ INSERT INTO t2 VALUES (65),(66);
+ SELECT a FROM t1 WHERE a NOT IN (65,66);
+ a
+ 1
+ -1
+ SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2);
+ a
+ 1
+ SELECT a FROM t1 WHERE NOT EXISTS (SELECT a FROM t2 WHERE t2.a = t1.a);
+ a
+ 1
+ -1
+ EXPLAIN
+ SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2);
+ id    select_type     table   type    possible_keys   key     key_len ref     rowExtra
+ 1     PRIMARY t1      ALL     NULL    NULL    NULL    NULL    4       Using where
+ 2     DEPENDENT SUBQUERY      t2      unique_subquery PRIMARY PRIMARY 4       funUsing index
+ EXPLAIN
+ SELECT a FROM t1 WHERE NOT EXISTS (SELECT a FROM t2 WHERE t2.a = t1.a);
+ id    select_type     table   type    possible_keys   key     key_len ref     rowExtra
+ 1     PRIMARY t1      ALL     NULL    NULL    NULL    NULL    4       Using where
+ 2     DEPENDENT SUBQUERY      t2      eq_ref  PRIMARY PRIMARY 4       t1.a    1  Using where; Using index
[12 Apr 2007 15:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/24393

ChangeSet@1.2445, 2007-04-12 18:17:38+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855: Corrected the handling of conversion errors and
  NULL key values in an IN predicate that is resolved by 
  unique_subquery (through subselect_uniquesubquery_engine).
  See code comments for further explanations.
[16 May 2007 10:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/26799

ChangeSet@1.2445, 2007-05-16 13:05:27+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855:
  Conversion errors when constructing the condition for an
  IN predicates were treated as if the affected column contains
  NULL. In such cases the IN predicate returns NULL.
  This is not a problem in the IN predicate is a top-level 
  predicate (where NULL is treated as FALSE), but
  causes an enclosing NOT predicate to also return NULL.
  And since this means FALSE if the NOT predicate is top level
  we get wrong results.
  Corrected the handling of conversion errors and
  NULL key values in an IN predicate that is resolved by 
  unique_subquery (through subselect_uniquesubquery_engine).
  See code comments for further explanations.
[17 May 2007 16:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/26916

ChangeSet@1.2445, 2007-05-17 19:38:34+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855:
  Conversion errors when constructing the condition for an
  IN predicates were treated as if the affected column contains
  NULL. If such a IN predicate is inside NOT we get wrong 
  results.
  Corrected the handling of conversion errors in an IN predicate 
  that is resolved by unique_subquery (through 
  subselect_uniquesubquery_engine).
[4 Jun 2007 21:20] Bugs System
Pushed into 5.1.20-beta
[4 Jun 2007 21:22] Bugs System
Pushed into 5.0.44
[18 Jun 2007 16:02] Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs.