Bug #60885 (not so) wrong NOT IN result (when NULL values are involved)
Submitted: 15 Apr 2011 20:49 Modified: 16 Apr 2011 2:33
Reporter: Vares Pierre-Olivier Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.0.32-Debian_7etch11 OS:Linux
Assigned to: CPU Architecture:Any
Tags: equivalence, not exists, NOT IN, null

[15 Apr 2011 20:49] Vares Pierre-Olivier
Description:
I was filtering some data in my database and, well, I just found some strange result of a query using NOT IN.

For you to understand, I got a temp table id_adresses_vertes where I store the id of objects I want to keep.

--- let me show you what's in the table
SELECT COUNT(*) FROM id_adresses_vertes; # 1004
SELECT COUNT(DISTINCT id_responsable) FROM id_adresses_vertes; #663
SELECT * FROM id_adresses_vertes WHERE id_responsable = 971; # Empty set

-- and now, let's compare the NOT IN version of the query 
-- and the NOT EXISTS one

SELECT id FROM responsable_adresse 
WHERE id NOT IN (SELECT DISTINCT id_responsable FROM id_adresses_vertes);
Empty set 
which is WRONG (there's no row in id_adresses_vertes with id_responsable = 971, but there's one in responsable_adresse with this id)

SELECT id FROM responsable_adresse 
WHERE NOT EXISTS(SELECT * FROM id_adresses_vertes WHERE id_responsable = responsable_adresse.id);
# Result set with 2195 records (correct)

--- Some tests
-- all id_responsable values are < 3000, so just add a WHERE in the subquery
SELECT id FROM responsable_adresse
WHERE id NOT IN (
SELECT DISTINCT id_responsable FROM id_adresses_vertes
WHERE id_responsable BETWEEN 1 AND 3000
)
# Result set with 2195 records (correct)

-- let's explain the NOT IN version
EXPLAIN SELECT id FROM responsable_adresse WHERE id NOT IN (
SELECT DISTINCT id_responsable FROM id_adresses_vertes);

1 PRIMARY responsable_adresse 	index 	NULL PRIMARY 	4 	NULL 	2869 	Using where; Using index
2 DEPENDENT SUBQUERY 	id_adresses_vertes 	ALL NULL 	NULL 	NULL 	NULL 	751 	Using where; Using temporary

-- and the NOT EXISTS version
EXPLAIN
SELECT id FROM responsable_adresse 
WHERE NOT EXISTS(SELECT * FROM id_adresses_vertes WHERE id_responsable = responsable_adresse.id);
1 PRIMARY responsable_adresse 	index 	NULL PRIMARY 	4 	NULL 	2869 	Using where; Using index
2 DEPENDENT SUBQUERY 	id_adresses_vertes 	ALL NULL 	NULL 	NULL 	NULL 	1041 	Using where

How to repeat:
I'll try to make a reproductible case with less data, so I can post it here.
[15 Apr 2011 22:56] Vares Pierre-Olivier
Test case to reproduce

Attachment: test_case.sql (text/x-sql), 58.27 KiB.

[15 Apr 2011 23:05] Vares Pierre-Olivier
Oh my god !
I just realize, looking at the dump : would the NULL values be the cause of that ?
It would effectively be coherent, "NOT IN" being taken as a term-by-term comparison (a comparison with NULL being NULL)
And coherent with the add of BEETWEEN, because this condition eliminates every NULL value.

Really tricky... I should have been more aware of the NULL power. So a NOT IN and a NOT EXISTS are not equivalent, I'll know it from now on...

Do I have to set the bug as "Not a bug" or close it ?
[16 Apr 2011 2:33] MySQL Verification Team
Thank you for the feedback.