Bug #22241 Problem with NOT IN () resulting in unexpected results
Submitted: 11 Sep 2006 18:01 Modified: 16 Oct 2006 7:50
Reporter: Ankit Shah Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.21-standard-log OS:pc-linux-gnu
Assigned to: CPU Architecture:Any

[11 Sep 2006 18:01] Ankit Shah
Description:
I have a table with following format ...

CREATE TABLE `table1` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `date` date default NULL,
  `upc` bigint(20) default NULL,
  `cd` tinyint(3) unsigned default NULL,
  `track_id` tinyint(3) unsigned default NULL,
  `qty` int(11) default NULL,
  `total` float default NULL,
   PRIMARY KEY  (`statement_detail_id`),                            
   KEY `upc` (`upc`),                                               
   KEY `upc_cd_track_id` (`upc`,`cd`,`track_id`),                   
   KEY `date` (`date`)                                              
   ) ENGINE=MyISAM DEFAULT CHARSET=utf8 

Above table has approximately 15 million rows.
Note: I have only put columns related to the query.   

Now, I have a list of UPCs that I need to use to perform two queries.
1.  Get sum of total for rows with only those upcs.
2.  Get sum of total for rows without those upcs.

So for #1, I have ...
SELECT sum(total) as total
FROM table1
WHERE date BETWEEN '2005-10-01' AND '2005-12-31'						AND upc IN (123456789128,669910001264,669910001660,669910002162 ...)

The list of upcs is quite big (approximately 16,000).  Above query gives me a total of about 300,000.

For #2, I have
SELECT sum(total) as total
FROM table1
WHERE date BETWEEN '2005-10-01' AND '2005-12-31'						AND upc NOT IN (123456789128,669910001264,669910001660,669910002162 ...)

This query (the list of upcs are exactly same in both) gives me 0.  

This is wrong because if i perform following ...

SELECT sum(total) as total
FROM table1
WHERE date BETWEEN '2005-10-01' AND '2005-12-31'	

I get approximately 1,300,000 and there are no null values in the UPC column. ( I checked by performing following query).

SELECT upc
FROM dig_sales_detail 
WHERE upc IS NULL;

The resultset was empty.

Just to test, I took out couple of upcs from the list performed following query ...
SELECT distinct UPC
FROM table1
WHERE date BETWEEN '2005-10-01' AND '2005-12-31'						AND upc NOT IN (669910001264,669910001660,669910002162, ...)

Suprisingly I recieved 23 different upcs by just taking out 1 upc from the list.  How can that be?

Finally, I tried following query ...

SELECT sum(total) as total
FROM table1
WHERE date BETWEEN '2005-10-01' AND '2005-12-31'						AND CAST(upc as SIGNED INTEGER) NOT IN (123456789128,669910001264,669910001660,669910002162 ...)

This query gave me the correct number approximately 1,000,000.  

How to repeat:
Create a table with provided syntax. 
Populate it with millions of lines.
Get a list of 16,000 upcs.
Run the above mentioned queries.
[12 Sep 2006 9:00] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version of MySQL server, 5.0.24a, and inform about the results. In case of the same behaviour, please, send your my.cnf content also.
[14 Sep 2006 10:09] Ulrich Priesner
We experience the same problem with MySQL 5.0.24-standard-log; it seems as if the IN(value1, value2,...) can take at max 999 values. If more than 999 are used in IN(...), the results are wrong. This bug is serious as standard queries now return wrong results.
[14 Sep 2006 11:12] Ulrich Priesner
Correction to my previous post: the bug appears only for NOT IN(...).
[13 Oct 2006 13:35] Valeriy Kravchuk
All reporters:

Please, try to repeat with a newer version, 5.0.26, and inform about the results.
[13 Oct 2006 16:32] Ankit Shah
Hello,

I just tried the queries in the 5.0.26 version on Windows XP.  I am very glad to report that this bug seems to have been fixed in this version. 

I performed the both queries with IN () and NOT IN ().  I took the resulting sum from both queries and it matched up exactly when I repeated the query without IN () or NOT IN () clauses.   

As relieved as I am about the fix, I am also curious as to what was causing this issue in the earlier versions.  Alos, was the fix a result of this bug report or due to a fix for another bug that also happened to resolve this issue?

Quite frankly, I would appreciate your part of the story as when I did not hear anything from you guys for close to a month, I seriously thought my complaint was lost among plethora of other things that you guys must be dealing with day and day out.
 
Thanks,

Ankit.
[16 Oct 2006 7:50] Valeriy Kravchuk
Looks like this was a duplicate of bug #21282. I don't know why I had not identified it as such from the very beginning.