Bug #21184 Bug with a Query with "NOT IN"
Submitted: 20 Jul 2006 15:25 Modified: 19 Sep 2006 17:01
Reporter: Falk Roßjat Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.0 OS:Any (all)
Assigned to: Assigned Account CPU Architecture:Any
Tags: NOT IN, subquery, subselect

[20 Jul 2006 15:25] Falk Roßjat
Description:
Hallo!

My problem:

I have 2 tables (t1,t2). Table t1 have the rows 'ID' and 'name' and in moment 5000 cols, table t2 the row t_id with 2000 cols.
Now i have the Query:
  $query = 'SELECT name FROM t1 WHERE ID NOT IN (SELECT DISTINCT t_id FROM t2)';

The query runs but need 6 seconds (to long!!!)

I have changed my query with PHP:
  //Query 1
  $q1 = 'SELECT DISTINCT t_id FROM t2';
  // I change the Result Array with:
  $s = '('.implode(',',$q1).')';
  //Query 2
  $q2 = 'SELECT name FROM t1 WHERE ID NOT IN '.$s;

This Query run in MySQL 4.1.12, the result is correct and needs only 0.1 seconds (o.k.)
This Query runs also in MySQL 5.0.22 and 5.1.11 but there is no result.

How to repeat:
see description
[20 Jul 2006 16:41] Falk Roßjat
This are the tables for MySQL Version 5.1

Attachment: mysql_5_1.zip (application/x-zip-compressed, text), 59.91 KiB.

[20 Jul 2006 16:42] Falk Roßjat
These are the tables for MySQL Version 4.1.12

Attachment: mysql_4_1_12.zip (application/x-zip-compressed, text), 42.42 KiB.

[21 Jul 2006 8:02] Falk Roßjat
Now I Know it is a problem with the Primary Key Index on table t1. 
If I have no Index on table t1 I get the correct result also in MySQL 5.1. 
If I have a Primary Key Index on the id row I get no result in MySQL 5.1.
[21 Jul 2006 8:42] Falk Roßjat
Sorry wrong severity.
I think it is a problem with the optimizer and the access to index.
[21 Jul 2006 11:27] Falk Roßjat
If I take an Index on Table t2(t_id) then Query1 is very fast.
[21 Jul 2006 21:09] Sveta Smirnova
simplified test

Attachment: bug21184.test (application/octet-stream, text), 30.27 KiB.

[21 Jul 2006 21:09] Sveta Smirnova
Thank you for the report.

Verified as described using MySQL 5.0.25 and 5.1.12 BK development source tree on Linux and Windows.

First query takes about 99% CPU, but returns correct result: 2426 rows. Second query returns only first 26 rows.

In attached file is simplified test case.
[19 Sep 2006 16:31] Sergey Petrunya
As far as I understand the bug is about wrong query results?

Can't repeat with 5.0.26 tree, tip cset 
ChangeSet@1.2272, 2006-09-18 19:01:07+04:00, gkodinov@dl145s.mysql.com +4 -0 :

Both queries in Sveta's example return the same set of 303 rows in 0.01 sec.
[19 Sep 2006 16:38] Sergey Petrunya
Trying the original testcase:

SELECT name FROM t1 WHERE ID NOT IN (SELECT DISTINCT t_id FROM t2); 
returns 2426 rows, in 9.5 seconds

If one manually assembles the list of constants in the NOT IN list, 
the query will return the same 2426 rows in 0.03 sec.
[19 Sep 2006 17:00] Sergey Petrunya
The problem with wrong query results was fixed by BUG#21282.

The problem with combination two queries being faster than one query with subquery is, technically, not a bug (limitations of subquery optimization are documented). We accept it as your input and will look at addressing it in a future version.
[19 Sep 2006 17:01] Sergey Petrunya
Setting to duplicate of BUG#21282. 
Falk, thanks for your time to write to us.