Bug #37111 a complex join returns too few results
Submitted: 30 May 2008 23:32 Modified: 2 Jun 2008 13:43
Reporter: Roger Pack Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0.51b OS:Any
Assigned to: CPU Architecture:Any
Tags: join, query

[30 May 2008 23:32] Roger Pack
Description:
perhaps related to http://forums.mysql.com/read.php?106,210533,210533#msg-210533

When I do one query

mysql> SELECT schools.name,schools.enabled FROM `programs` LEFT OUTER JOIN `schools` ON `schools`.id = `programs`.school_id WHERE programs.enabled = 1 and schools.id IN (7,84,4,9,31,18,8,30,38,40,41,45,49,54,57,60,61,77,78,79) AND schools.is_online = 1;
...
279 rows in set (0.00 sec)

It gives the expected results. Now if I add another if clause to it [note that all 279 had schools.enabled = 1, so the if clause shouldn't change anything]

mysql> SELECT schools.name,schools.enabled FROM `programs` LEFT OUTER JOIN `schools` ON `schools`.id = `programs`.school_id WHERE programs.enabled = 1 and schools.id IN (7,84,4,9,31,18,8,30,38,40,41,45,49,54,57,60,61,77,78,79) AND schools.is_online = 1 AND schools.enabled = 1;
...
127 rows in set (0.00 sec)

It should have been 279 but it was 127. 

Not good.

Note that this was on Mac OS 10.5 with mysql 5.0.51b.  However, I was able to recreate it on a linux box running mysql 5.0.51a, so it's not mac specific.

Now here's the real kicker.
I dumped the offending database, and reloaded it in to 3 different, newly created, blank databases, test3, test4, and test5.
And ran the above tests on it.
The results [you guessed it].
test3 worked like it should [279 answers each time] test 4 erred [173 each time] and test5 worked like it should.  This is truly odd.
I did a DB dump of each test database afterword, and a diff of the files revealed that the DB contents were the same.  So this might be a tough, lurking bug in there somewhere.  Quite reproducible, in certain contexts, however.

How to repeat:
e-mail me and I'll shoot you a DB dump.  It's somewhat sensitive data so single emails would be best.
Then run the above queries.

Also note that 
it seemed that decreasing the query 'any way' slightly would make it work.

For example, if you decrease the lengthy ID list in the query, it would work as it should [i.e. if the ID list has 15 items, it returns 173 items, if 9, 260 or so].

If you take out any of the 'where' clauses then it returns all 279.

So it is indeed odd.

Suggested fix:
Unknown.
For me simplifying the query and running it 3 times with 1/3rd of the ID's per time [and adding together the resulting sets] fixed the problem.  A non-optimal hack, I suppose.  [note that I didn't do SQL UNION's, and haven't tried that way, but just added them programmatically].
[31 May 2008 16:33] Sveta Smirnova
Thank you for the report.

Please upload dump of the database.
[31 May 2008 18:07] Roger Pack
it's conceivable that the problem is caused by 'weird' characters in the text fields, like \240 or what not [non standard ASCII characters], though I'm not sure.  Also remember that with 'some' versions of uploading the database it worked, others failed.  Might be worth trying it a few times and on different versions of mysql, if it doesn't reproduce.
Thanks for your help in making mysql better for all of us.
[2 Jun 2008 12:35] Susanne Ebrecht
You have a really chaos at the encoding of your dump. I don't know what it is. My system told me it should be ASCII but there are signs in it which aren't UTF-8 or ISO-8859-15.

Please, correct your dump (manually) and try it again. 
Please, take care that you use the right character sets for your database and client/result.
[2 Jun 2008 12:59] Sveta Smirnova
Thank you for the feedback.

No additional information needed. I was able to repeat the problem with version 5.0.51a, but not able to repeat with current development sources. Please wait next release. Temporarily workaround: drop key index_schools_on_enabled
[2 Jun 2008 13:43] Roger Pack
bizarrely, dropping the suggested index and re-adding it causes the database to work.  Glad to hear that this problem [won't] exist in later versions.