Bug #41772 | Using a timestamp in ORDER BY on a GROUP BY SQL doesn't work properly | ||
---|---|---|---|
Submitted: | 28 Dec 2008 19:37 | Modified: | 30 Dec 2008 17:59 |
Reporter: | Who Cares | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Parser | Severity: | S3 (Non-critical) |
Version: | 5.0.41 | OS: | MacOS (Leopard 10.5.5) |
Assigned to: | CPU Architecture: | Any | |
Tags: | timestamp order by group by wrong order |
[28 Dec 2008 19:37]
Who Cares
[28 Dec 2008 20:04]
Who Cares
SELECT scorelist .name, MAX(scorelist.highscore) AS highscore FROM ( SELECT name, MAX(score) AS highscore, createdat FROM highscores GROUP BY name, createdat ) AS scorelist GROUP BY scorelist.name ORDER BY scorelist.highscore DESC, scorelist. createdat ASC
[28 Dec 2008 20:42]
MySQL Verification Team
Thank you for the bug report. I couldn't repeat with source server, please upgrade to latest released version (the version reported is quite older) or wait for the next released version. Thanks in advance. mysql 5.0 > SELECT name, MAX(score) -> FROM highscores -> GROUP BY name -> ORDER BY score DESC, createdat ASC -> ; +----------+------------+ | name | MAX(score) | +----------+------------+ | Player A | 120 | | Player B | 120 | +----------+------------+ 2 rows in set (0.00 sec) mysql 5.0 > SELECT scorelist .name, MAX(scorelist.highscore) AS highscore -> FROM ( -> SELECT name, -> MAX(score) AS highscore, -> createdat -> FROM highscores -> GROUP BY name, createdat -> ) AS scorelist -> GROUP BY scorelist.name -> ORDER BY scorelist.highscore DESC, scorelist. createdat ASC; +----------+-----------+ | name | highscore | +----------+-----------+ | Player B | 120 | | Player A | 120 | +----------+-----------+ 2 rows in set (0.00 sec) mysql 5.0 > show variables like "%version%"; +-------------------------+---------------------+ | Variable_name | Value | +-------------------------+---------------------+ | protocol_version | 10 | | version | 5.0.76-nt-log | | version_comment | Source distribution | | version_compile_machine | ia32 | | version_compile_os | Win32 | +-------------------------+---------------------+ 5 rows in set (0.05 sec)
[28 Dec 2008 20:48]
Peter Laursen
I agree with WC! Let us elaborate a little -- if you do SELECT name, MAX(score), createdat FROM highscores GROUP BY name; /* you'll get name MAX(score) createdat -------- ---------- ------------------- Player A 120 2008-12-28 19:13:36 Player B 120 2008-12-28 19:10:57 */ -- so next adding ORDER BY SELECT name, MAX(score), createdat FROM highscores GROUP BY name ORDER BY score DESC, createdat ASC; /* name MAX(score) createdat -------- ---------- ------------------- Player A 120 2008-12-28 19:13:36 Player B 120 2008-12-28 19:10:57 */ -- however SELECT name, MAX(score), createdat FROM highscores GROUP BY name ORDER BY createdat ASC; /* name MAX(score) createdat -------- ---------- ------------------- Player B 120 2008-12-28 19:10:57 Player A 120 2008-12-28 19:13:36 */ ... so the server *seems to think* that one occurrence of 120 for 'score' is larger than the other that is also 120! It is really incredible how MySQL 5.0 has introduced silly issues with simple queries. There seems to be no end of such discoveries! BTW: 4.1.22 behaves as expected, 5.1.67 and 5.1.30 don't!
[28 Dec 2008 20:51]
Peter Laursen
@miguel You can provide us with Mac and Windows binaries for 5.0.75 (or me with a free copy of Visual Studio) and we will test this on 5.0.75 too! The latest 5.0 and 5.1 *official MySQL binaries* are affected!
[28 Dec 2008 20:53]
MySQL Verification Team
hmm Sorry I didn't noticed my result is different than the expected one.
[28 Dec 2008 20:57]
Peter Laursen
6.0.8 is also affected!
[28 Dec 2008 20:57]
MySQL Verification Team
@Peter, See my prior comment.
[28 Dec 2008 20:58]
Peter Laursen
@miguel .. I saw now!
[28 Dec 2008 21:28]
MySQL Verification Team
@Peter, Which OS's version have you got the expected result with 4.1.22?: mysql> SELECT name, MAX(score) -> FROM highscores -> GROUP BY name -> ORDER BY score DESC, createdat ASC -> ; +----------+------------+ | name | MAX(score) | +----------+------------+ | Player A | 120 | | Player B | 120 | +----------+------------+ 2 rows in set (0.06 sec) mysql> show variables like "%version%"; +-------------------------+-------------------------------+ | Variable_name | Value | +-------------------------+-------------------------------+ | protocol_version | 10 | | version | 4.1.22-community-nt | | version_comment | MySQL Community Edition (GPL) | | version_compile_machine | ia32 | | version_compile_os | Win32 | +-------------------------+-------------------------------+ 5 rows in set (0.00 sec)
[28 Dec 2008 22:13]
Peter Laursen
ooops ... my mistake! CREATE TABLE `highscores` ( `name` char(30) NOT NULL, `score` int(8) NOT NULL, `createdat` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP ) ENGINE=MyISAM DEFAULT CHARSET=latin1; insert into `highscores` values('Player A','120','2008-12-28 19:13:36'), ('Player B','90','2008-12-28 19:10:57'), ('Player A','70','2008-12-28 19:11:09'), ('Player B','120','2008-12-28 19:10:21'); SELECT name, MAX(score), createdat FROM highscores GROUP BY name ORDER BY score DESC, createdat ASC; name MAX(score) createdat -------- ---------- ------------------- Player A 120 2008-12-28 19:13:36 Player B 120 2008-12-28 19:10:57 show variables like '%version%'; Variable_name Value ----------------------- ----------------------------- protocol_version 10 version 4.1.22-community-nt version_comment MySQL Community Edition (GPL) version_compile_machine ia32 version_compile_os Win32 (I was executing ".. ORDER BY createdat ASC" on 4.1.22 only in the first place). Also I notice that this is not related to TIMESTAMP type. Creating the column as varchar(50) makes no difference. And even CREATE TABLE `highscores3` ( `name` char(30) NOT NULL, `score` int(8) NOT NULL, `createdat` INT ) ENGINE=MyISAM DEFAULT CHARSET=latin1; insert into `highscores3` values('Player A',120,191336), ('Player B','90',191057), ('Player A','70',191109), ('Player B','120',191021); ..etc doesn't either. This is also interesting SELECT name, score, createdat FROM highscores GROUP BY name ORDER BY score DESC, createdat ASC; name score createdat -------- ------ ------------------- Player A 120 2008-12-28 19:13:36 Player B 90 2008-12-28 19:10:57 .. what seems to indicate that ORDER BY in the original statement is done on non-aggregated data. So maybe the query should be written like SELECT name, max(score) FROM highscores GROUP BY name ORDER BY max(score) DESC, createdat ASC; /* name max(score) -------- ---------- Player B 120 Player A 120 */ .. I now tend to think that this is *not a bug*
[29 Dec 2008 17:07]
Sveta Smirnova
Peter, thank you for the feedback. We tend to agree with you. Regarding to our user manual: http://dev.mysql.com/doc/refman/5.0/en/select.html: MySQL resolves unqualified column or alias references in ORDER BY clauses by searching in the select_expr values, then in the columns of the tables in the FROM clause. For GROUP BY or HAVING clauses, it searches the FROM clause before searching in the select_expr values. (For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 behavior that used the same rules as for ORDER BY.) Additionally using only part of requested columns in GROUP BY expression is MySQL extension which should be used with care. Please see http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html: MySQL extends the use of GROUP BY so that you can use non-aggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. ... Do not use this feature if the columns you omit from the GROUP BY part are not constant in the group. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
[30 Dec 2008 14:09]
Who Cares
Hello First of all thank you for taking it serious. I would like to agree with you and maybe I dont understand the lastes key point. I also removed my alias in the SQL that wasn't the solution. I tried your latest (working) SQL: SELECT name, max(score) FROM highscores GROUP BY name ORDER BY max(score) DESC, createdat ASC; BEFORE I started it I prepared my test examples concerning the relevant records (120 score's record): name score createdat Player A 120 2004-12-28 19:13:38 Player B 120 2005-12-28 19:13:39 After starting your SQL (mentioned above) I got the following output: name score Player A 120 Player B 120 Fine, fine, correct order. Now I have to possibilities to change the createdat order, either on Player A or on player B. Changing it on Player A (higher or lower than Player B's createdat) will always give me the correct order after executing your SQL (so did mine originally). Before changing A's createdat I change the year of Player B's createdat to 2003 -> 2003-12-28 19:13:38 The player B's createdat should now be older than Player A's createdat. Executing your SQL shows me: name score Player A 120 Player B 120 Thats not right! I just changed now player A's year to 2005 and the order doesnt change either. Its really weird. Not to say Im good or Oracle is better but Im working since years with Oracle and this situation works fine on Oracle side. If you can reproduce my example from above then create an additional field called f.e. counter (INT 2). Insert a 1 and 2 into the columns on the A and B records with 120 score. Change the SQL from above: ... ORDER BY max(score) DESC, counter ASC; You will see it works flawless, however you set the counter values or change the order by to counter DESC. Im still sure MySQL, as much as I appreciate it, has a problem doing an order by on timestamps withing a group by, while other datatypes work fine with it. Greetings Farai
[30 Dec 2008 14:28]
Who Cares
Sorry, I have to correct myself. The example with "counter" doesn't work either properly, the same way wrong as the intial timestamp sample. I don't understand then why not in SELECT displayed columns but in the SQL used columns are not taken care of. Im also not forced to display (SELECT) a column that is part of the WHERE clause to make it work fine. Greetings Farai
[30 Dec 2008 14:39]
Peter Laursen
just an advice from another user: you will need to change the status from 'not a bug' to 'open' if you shall be sure that your comments get attention! (you do from 'edit submission' tab) Are you telling that this query SELECT name, max(score) FROM highscores GROUP BY name ORDER BY max(score) DESC, createdat ASC; does not work either? If you only "ORDER BY score" the server will simply pick and return the first non-aggreated value for score it finds in each GROUP. And it may not be the max. value!
[30 Dec 2008 16:05]
Sveta Smirnova
Who Cares, thank you for the feedback. But query like SELECT name, max(score) FROM highscores GROUP BY name ORDER BY max(score) DESC, createdat ASC; uses field createdat in ORDER BY and does not use same field in GROUP BY. This is specific MySQL extension of SQL standard. Nor Oracle, neither PostgreSQL allows such queries at all. You will get syntax error. In case of MySQL you should use this option with care and be aware you'll get a field with random createdat with such SELECT.
[30 Dec 2008 17:38]
Who Cares
Thanks again for your care. Well standards, I've seen so many in browsers and DB's. Everyhone has its own but thats not the point here. I tried it out in SQL (see below) and you are absolutely right. Guess I never used it in that context in Oracle. I have to use createdat in the GROUP BY. Connected to: Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.7.0 - Production => Creating table highscores SQL> desc highscores Name Null? Type ----------------------------------------- -------- ----------------- NAME VARCHAR2(30) SCORE NUMBER(5) CREATEDAT DATE => Displaying all records SQL> SELECT name, score, TO_CHAR(createdat, 'yyyy-mm-dd HH24:mi:ss') 2 FROM highscores; NAME SCORE TO_CHAR(CREATEDAT,' ------------------------------ ---------- ------------------- Player A 120 2008-12-30 18:08:52 Player B 80 2008-12-30 18:02:52 Player A 70 2008-12-30 18:04:52 Player B 120 2008-12-30 18:03:52 => Executing the SQL SQL> SELECT NAME, MAX(score) 2 FROM highscores 3 GROUP BY NAME, createdat 4 ORDER BY MAX(score) DESC, createdat ASC; NAME MAX(SCORE) ------------------------------ ---------- Player B 120 Player A 120 Player B 80 Player A 70 => Changing the minutes of createdat of player B (in the background), so its afer than A and show again SQL> SELECT name, score, TO_CHAR(createdat, 'yyyy-mm-dd HH24:mi:ss') 2 FROM highscores; NAME SCORE TO_CHAR(CREATEDAT,' ------------------------------ ---------- ------------------- Player A 120 2008-12-30 18:08:52 Player B 80 2008-12-30 18:02:52 Player A 70 2008-12-30 18:04:52 Player B 120 2008-12-30 18:10:52 => Executing the SQL again SQL> SELECT NAME, MAX(score) 2 FROM highscores 3 GROUP BY NAME, createdat 4 ORDER BY MAX(score) DESC, createdat ASC; NAME MAX(SCORE) ------------------------------ ---------- Player A 120 Player B 120 Player B 80 Player A 70 Conclusion: When I use createdat in the GROUP BY inside MySQL it also works perfectly. Well in both cases, MySQL and Oracle, more records than wanted appears, because of the createdat in the GROUP BY, what is obvious and understandabe. Its another thing to ged rid of them (Aggregating Functions in Oracle) but all over it works! The big question now is: Why does MySQL offer what I did accidentally if it can run into wrong results? Well I know now and will take care of. My Inline view works fine with it. You can close this task now forever. I was not aware of reopening it as someone suggested (tnx to let me know). Thank you all for your quick and professional help and I hope I didn't waste your time too much. I was and I am still an fan of MySQL (has so many great features I miss in Oracle). Greetings Farai
[30 Dec 2008 17:59]
Sveta Smirnova
Thank you for the feedback. > The big question now is: Why does MySQL offer what I did accidentally if it can run into wrong results? For performance. I believe I posted this link already: http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html "MySQL extends the use of GROUP BY so that you can use non-aggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping."