Bug #22855 Optimizer doesn't rewrite NOT IN subselects to a correlated subquery
Submitted: 30 Sep 2006 11:36 Modified: 18 Jun 2007 18:02
Reporter: [ name withheld ]
Status: Closed
Category:Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.13, 4.1.22BK, 5.0.40-BK, 5.1.18-beta-BK OS:Linux (SUSE Linux 10.0, WinXP)
Assigned to: Georgi Kodinov Target Version:

[30 Sep 2006 11:36] [ name withheld ]
Description:
There seems to be a problem with sub queries, signed ints and unsigned ints, and primary
keys. It might be related to bug #10649 which had a complex reproduction scenario and
wasn't solved.

How to repeat:
Here's a test program that shows the problem:

--
create table t1 ( course_id int(10) signed not null,   primary key (course_id) );
create table t2 ( course_id int(10) unsigned not null, primary key (course_id) );

insert into t1 values (1),(-1),(-2),(2),(3),(4),(5),(6),(7),(8),(65),(66),(67),(68);
insert into t2 values (65),(66),(67);

select distinct course_id from t1 where course_id not in (65,66,67);

drop table t1;
drop table t2;
--

Running on MySQL 4.1.13 on SUSE Linux, this doesn't print anything. It
should print those values in t1 but not t2. If I replace the subselect
with the result of the subselect (65,66,67) then it works as expected.
It also works if I remove the primary key from t2. If I just remove
the primary key from t1 it prints a somewhat mysterious '1'.

I posted this to the mysql mailing list and someone else confirmed the problem exists in
5.0.21 as well.

Suggested fix:
No fix but several workarounds. Removing the primary key is one. Someone else suggested
making the signed/unsigned ints the same type solves the problem. Neither of these suit
my application. Someone gave me a different query that doesn't use a subselect, and that
works, but I don't understand the query so it's not ideal!
[30 Sep 2006 14:12] Tonci Grgin
Hi and thanks for your bug report. I was able to verify it on latest 4.1.22BK. Please use
one of the provided (and documented) workarounds. Primary keys have nothing to with this.
By dropping them you are directing optimizer on different execution path (table scan).

Your MySQL connection id is 1 to server version: 4.1.22-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> create table t1 ( course_id int(10) not null,   primary key (course_id));
Query OK, 0 rows affected (0.03 sec)

mysql> create table t2 ( course_id int(10) unsigned not null, primary key (cours
e_id));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into t1 values (1),(-1),(-2),(2),(3),(4),(5),(6),(7),(8),(65),(66)
,(67),(68);
Query OK, 14 rows affected (0.00 sec)
Records: 14  Duplicates: 0  Warnings: 0

mysql> insert into t2 values (65),(66),(67);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select distinct course_id from t1 where course_id not in (65,66,67);
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

mysql> select distinct course_id from t1 where course_id not in (SELECT  course_id FROM
t2);
Empty set (0.00 sec)

Workaround 1 (rewriting as left join, preferred way):
mysql> select distinct t1.course_id from t1 left join t2 on t1.course_id=t2.course_id
where t2.course_id is null;
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

Workaround 2 (working instead of optimizer, writing correlated subquery):
mysql> select distinct t1.course_id from t1
    ->   where not exists (
    ->     select t2.course_id from t2
    ->     where t2.course_id=t1.course_id
    ->   );
+-----------+
| course_id |
+-----------+
|        -2 |
|        -1 |
|         1 |
|         2 |
|         3 |
|         4 |
|         5 |
|         6 |
|         7 |
|         8 |
|        68 |
+-----------+
11 rows in set (0.00 sec)

mysql>

Related manual pages:
http://dev.mysql.com/doc/refman/4.1/en/rewriting-subqueries.html
http://dev.mysql.com/doc/refman/4.1/en/subquery-restrictions.html
...
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it
as a correlated subquery. Consider the following statement that uses an uncorrelated
subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
...
[11 Apr 2007 17:58] Georgi Kodinov
Recreated with 5.0.40-BK and 5.1.18-beta-BK.
I don't see the original problem anymore (i.e. the empty set whe using NOT IN (SELECT
...)).
But I see missing rows (the negative rows). Here's the reduced test case:
+ CREATE TABLE t1 (a INT NOT NULL);
+ INSERT INTO t1 VALUES (1),(-1), (65),(66);
+ CREATE TABLE t2 (a INT UNSIGNED NOT NULL PRIMARY KEY);
+ INSERT INTO t2 VALUES (65),(66);
+ SELECT a FROM t1 WHERE a NOT IN (65,66);
+ a
+ 1
+ -1
+ SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2);
+ a
+ 1
+ SELECT a FROM t1 WHERE NOT EXISTS (SELECT a FROM t2 WHERE t2.a = t1.a);
+ a
+ 1
+ -1
+ EXPLAIN
+ SELECT a FROM t1 WHERE a NOT IN (SELECT a FROM t2);
+ id    select_type     table   type    possible_keys   key     key_len ref     rowExtra
+ 1     PRIMARY t1      ALL     NULL    NULL    NULL    NULL    4       Using where
+ 2     DEPENDENT SUBQUERY      t2      unique_subquery PRIMARY PRIMARY 4       funUsing
index
+ EXPLAIN
+ SELECT a FROM t1 WHERE NOT EXISTS (SELECT a FROM t2 WHERE t2.a = t1.a);
+ id    select_type     table   type    possible_keys   key     key_len ref     rowExtra
+ 1     PRIMARY t1      ALL     NULL    NULL    NULL    NULL    4       Using where
+ 2     DEPENDENT SUBQUERY      t2      eq_ref  PRIMARY PRIMARY 4       t1.a    1  Using
where; Using index
[12 Apr 2007 17:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/24393

ChangeSet@1.2445, 2007-04-12 18:17:38+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855: Corrected the handling of conversion errors and
  NULL key values in an IN predicate that is resolved by 
  unique_subquery (through subselect_uniquesubquery_engine).
  See code comments for further explanations.
[16 May 2007 12:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/26799

ChangeSet@1.2445, 2007-05-16 13:05:27+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855:
  Conversion errors when constructing the condition for an
  IN predicates were treated as if the affected column contains
  NULL. In such cases the IN predicate returns NULL.
  This is not a problem in the IN predicate is a top-level 
  predicate (where NULL is treated as FALSE), but
  causes an enclosing NOT predicate to also return NULL.
  And since this means FALSE if the NOT predicate is top level
  we get wrong results.
  Corrected the handling of conversion errors and
  NULL key values in an IN predicate that is resolved by 
  unique_subquery (through subselect_uniquesubquery_engine).
  See code comments for further explanations.
[17 May 2007 18:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/26916

ChangeSet@1.2445, 2007-05-17 19:38:34+03:00, gkodinov@macbook.gmz +3 -0
  Bug#22855:
  Conversion errors when constructing the condition for an
  IN predicates were treated as if the affected column contains
  NULL. If such a IN predicate is inside NOT we get wrong 
  results.
  Corrected the handling of conversion errors in an IN predicate 
  that is resolved by unique_subquery (through 
  subselect_uniquesubquery_engine).
[4 Jun 2007 23:20] Bugs System
Pushed into 5.1.20-beta
[4 Jun 2007 23:22] Bugs System
Pushed into 5.0.44
[18 Jun 2007 18:02] Paul DuBois
Noted in 5.0.44, 5.1.20 changelogs.