Bug #815 | SELECT with DISTINCT and JOIN ignores the ORDER BY | ||
---|---|---|---|
Submitted: | 8 Jul 2003 15:37 | Modified: | 27 Aug 2003 2:20 |
Reporter: | Stepan Riha | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 4.0 | OS: | x86 Linux |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
[8 Jul 2003 15:37]
Stepan Riha
[8 Jul 2003 15:38]
Stepan Riha
SQL dump of database showing problem
Attachment: nonplus_test.sql (text/plain), 7.74 KiB.
[10 Jul 2003 14:36]
MySQL Verification Team
I was not able to repeat it. I tried with tables that I have. Can you upload your tables to: ftp://support.mysql.com:/pub/mysql/secret and let us know a filename.
[10 Jul 2003 14:40]
Stepan Riha
I've already attached the SQL dump to the bug report (nonplus_test.sql). You should be able to get it via http://bugs.mysql.com/file.php?id=3 Let me know if for some reason you can't get to it.
[11 Jul 2003 3:52]
MySQL Verification Team
Yes, it is a bug.
[19 Jul 2003 11:34]
Sergei Golubchik
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.mysql.com/documentation/ and the instructions on how to report a bug at http://bugs.mysql.com/how-to-report.php This is not a bug. For each of distinct values of the column category_id, there are many values of entry_modified_on. Which value will be picked up (and used in ORDER BY) is undefined. To make it more obvious let's rewrite your query with GROUP BY: SELECT category_id FROM mt_entry JOIN mt_placement ON placement_entry_id = entry_id JOIN mt_category ON placement_category_id = category_id GROUP BY category_id ORDER BY entry_modified_on DESC; This query will return the same result. Now, you can add entry_modified_on column to SELECT list and see: mysql> SELECT category_id,entry_modified_on FROM mt_entry --> JOIN mt_placement ON placement_entry_id = entry_id --> JOIN mt_category ON placement_category_id = category_id --> GROUP BY category_id ORDER BY entry_modified_on DESC; +-------------+-------------------+ | category_id | entry_modified_on | +-------------+-------------------+ | 19 | 20030623165609 | | 4 | 20030426103241 | | 8 | 20030416204920 | | 9 | 20030317100950 | | 1 | 20030120134513 | +-------------+-------------------+ As you see, the result is correct - column entry_modified_on is, indeed, has decreasing ordering.
[22 Jul 2003 15:09]
Stepan Riha
As far as I can tell from the documentation, your rewritten query using GROUP BY is not semantically equivalent to the original query using DISTINCT. According to the MySQL documentation "DISTINCT [... specifies] that duplicate rows in the result set should be removed". It doesn't say anything about reordering of the result set. If you look at the result set produced without using DISTINCT, you will see that applying DISTINCT does not do what the documentation says it should do. Maybe the documentation needs to be updated?
[23 Jul 2003 5:11]
Sergei Golubchik
GROUP BY also does not mean reordering. And some DBMS'es does not return rows from GROUP BY in any particular order (if, e.g. GROUP BY is implemented with hash tables). And, yes, as "duplicate rows in the result set should be removed" it is not defined which row from the table will go into result set, so technically one cannot use in WHERE or ORDER BY fields that are not part of result set. According to SQL standard (I think any version of it will do) one is not allowed to use columns in the query with DISTINCT that are not part of result set.
[23 Jul 2003 6:55]
Stepan Riha
You're right about the requirement of having the ORDER BY columns in a DISTINCT select in other DBMSes. I tried the same setup using Microsoft's SQLServer and it complained about this very problem. I was (naively?) expecting that since MySQL did NOT return an error in my SQL, and since it has expanded standard SQL in other places (like SELECTing columns that are not specified as GROUP BY), that this was the behavior it would take. Plus I *think* it used to work in previous versions of MySQL. Oh well, I guess c'est la vie. You may want to change the documentation to reflect this, though.
[29 Jul 2003 3:53]
Lenz Grimmer
Assigned to documentation. Paul, could you please update the manual accordingly? Thanks!
[27 Aug 2003 2:20]
Michael Widenius
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant product(s). I am a little confused about what you think is the problem as I think MySQL does exactly what you told it to do (I may be a bit biased of course but...:) I uploaded the tables and tested this in MySQL 4.0: First the query without ORDER BY: mysql> SELECT DISTINCT category_id -> FROM mt_entry -> JOIN mt_placement ON placement_entry_id = entry_id -> JOIN mt_category ON placement_category_id = category_id ; +-------------+ | category_id | +-------------+ | 1 | | 4 | | 8 | | 9 | | 19 | +-------------+ Which I think we can both agree is correct. If we add ORDER BY entry_modified_on DESC we get: +-------------+ | category_id | +-------------+ | 19 | | 4 | | 8 | | 9 | | 1 | +-------------+ I don't see any reason why adding an ORDER BY should change the result (in your example you had two nines in the result set which looks wrong) As Sergei pointed out, the value for MySQL used for 'entry_modifed_on' is in your query not well defined, and in this which case MySQL will pick up the values from one of the matched rows. This is a very good extension in the case where you know that all matched rows will have an equal value for the referred columns (in this case 'entry_modifed_on'), but this is not the case in your example. To explain how MySQL works: - Read a row from mt_category. - Join it with mt_placement and mt_entry - As soon as there is a matching row, store 'category_id' and 'entry_modifed_on' and read next row from mt_category - Remove found duplicates for 'category' (Note that when removing duplicates entry_modifed_on is not considered) - Sort results based on the hidden entry_modifed_on field Note that in the manual section: http://www.mysql.com/doc/en/DISTINCT_optimisation.html we do mention the DISTINCT is normally optimized as a query with GROUP BY I have now updated the above manual section to refer to the GROUP BY extensions that covers how 'hidden fields' are used. Hope the above helps to understand this issue better. Regards, Monty