Bug #41772 Using a timestamp in ORDER BY on a GROUP BY SQL doesn't work properly
Submitted: 28 Dec 2008 19:37 Modified: 30 Dec 2008 17:59
Reporter: Who Cares Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Parser Severity:S3 (Non-critical)
Version:5.0.41 OS:MacOS (Leopard 10.5.5)
Assigned to: CPU Architecture:Any
Tags: timestamp order by group by wrong order

[28 Dec 2008 19:37] Who Cares
Description:
Using a timestamp in ORDER BY on a GROUP BY SQL doesn't work properly. The order rules are screwed up. 
The problem might exist also on other OS's'/versions

How to repeat:
1. Create the following table 'highscores' with the insert records. Its a score table of players. Out of this you want to order them to show the highscore list. In case two players have the same highscore you want to display the earlier entered highscore first (based on a timestamp called createdat).

-------------------------------------
SET NAMES latin1;
SET FOREIGN_KEY_CHECKS = 0;

CREATE TABLE `highscores` (
  `name` char(30) NOT NULL,
  `score` int(8) NOT NULL,
  `createdat` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into `highscores` values('Player A','120','2008-12-28 19:13:36'),
 ('Player B','90','2008-12-28 19:10:57'),
 ('Player A','70','2008-12-28 19:11:09'),
 ('Player B','120','2008-12-28 19:10:21');

SET FOREIGN_KEY_CHECKS = 1;
-------------------------------------

2. Execute this statement to see the ORDER BY on createdat works fine:
SELECT name, score, createdat
FROM highscores
ORDER BY createdat DESC

3. Start this statement to see createdat doesn't work properly
SELECT name, MAX(score)
FROM highscores
GROUP BY name
ORDER BY score DESC, createdat ASC

Expected result
Player B      120
Player A      120

It might come in the right order the very first time. In that case change the timestamp of Player B's score 120 record to 2008-12-28 19:14:21 (later than Player A's 120 score record). Restart the SQL and you see that Player B is still the first line, tough its timestamp createdat should bring him to the second line.
Now change the Player B's createdat of the 120 score to 2009 -> 2009-12-28 19:14:21 (1 year after Player A's 120 highscore. 1 year is a very obvious difference. Execute the SQL again and you see that the Player B's highscore is still on the first line. 
Now another strange thing: Change Player A's 120 score timestamp 'createdat' to one year after Player B's 120 score timestamp. Set it to: 2010-12-28 19:13:36
Now another weird thing: Change Player A's 120 timestamp createdat to 2006 -> 2006-12-28 19:13:36. You see that MySQL will do the right order now!
You can now modify Player B's 120 score timestamp to 2005 and A is still in the first row, after executing.

Suggested fix:
Dunno, Inline View or Subselect?
[28 Dec 2008 20:04] Who Cares
SELECT scorelist .name, MAX(scorelist.highscore) AS highscore
FROM (
  SELECT name, 
  MAX(score) AS highscore,
  createdat
  FROM highscores
  GROUP BY name, createdat
) AS scorelist
GROUP BY  scorelist.name
ORDER BY scorelist.highscore DESC, scorelist. createdat ASC
[28 Dec 2008 20:42] MySQL Verification Team
Thank you for the bug report. I couldn't repeat with source server, please upgrade to latest released version (the version reported is quite older) or wait for the next released version. Thanks in advance.

mysql 5.0 > SELECT name, MAX(score)
    -> FROM highscores
    -> GROUP BY name
    -> ORDER BY score DESC, createdat ASC
    -> ;
+----------+------------+
| name     | MAX(score) |
+----------+------------+
| Player A |        120 |
| Player B |        120 |
+----------+------------+
2 rows in set (0.00 sec)

mysql 5.0 > SELECT scorelist .name, MAX(scorelist.highscore) AS highscore
    -> FROM (
    ->   SELECT name,
    ->   MAX(score) AS highscore,
    ->   createdat
    ->   FROM highscores
    ->   GROUP BY name, createdat
    -> ) AS scorelist
    -> GROUP BY  scorelist.name
    -> ORDER BY scorelist.highscore DESC, scorelist. createdat ASC;
+----------+-----------+
| name     | highscore |
+----------+-----------+
| Player B |       120 |
| Player A |       120 |
+----------+-----------+
2 rows in set (0.00 sec)

mysql 5.0 > show variables like "%version%";
+-------------------------+---------------------+
| Variable_name           | Value               |
+-------------------------+---------------------+
| protocol_version        | 10                  |
| version                 | 5.0.76-nt-log       |
| version_comment         | Source distribution |
| version_compile_machine | ia32                |
| version_compile_os      | Win32               |
+-------------------------+---------------------+
5 rows in set (0.05 sec)
[28 Dec 2008 20:48] Peter Laursen
I agree with WC! Let us elaborate a little

-- if you do

SELECT name, MAX(score), createdat
FROM highscores
GROUP BY name;

/* you'll get

name      MAX(score)          createdat  
--------  ----------  -------------------
Player A         120  2008-12-28 19:13:36
Player B         120  2008-12-28 19:10:57 */

-- so next adding ORDER BY

SELECT name, MAX(score), createdat
FROM highscores
GROUP BY name
ORDER BY score DESC, createdat ASC;

/*
name      MAX(score)          createdat  
--------  ----------  -------------------
Player A         120  2008-12-28 19:13:36
Player B         120  2008-12-28 19:10:57 */

-- however

SELECT name, MAX(score), createdat
FROM highscores
GROUP BY name
ORDER BY createdat ASC;

/*
name      MAX(score)          createdat  
--------  ----------  -------------------
Player B         120  2008-12-28 19:10:57
Player A         120  2008-12-28 19:13:36 */

... so the server *seems to think* that one occurrence of 120 for 'score' is larger than the other that is also 120! It is really incredible how MySQL 5.0 has introduced silly issues with simple queries. There seems to be no end of such discoveries!

BTW: 4.1.22 behaves as expected, 5.1.67 and 5.1.30 don't!
[28 Dec 2008 20:51] Peter Laursen
@miguel

You can provide us with Mac and Windows binaries for 5.0.75 (or me with a free copy of Visual Studio) and we will test this on 5.0.75 too!

The latest 5.0 and 5.1 *official MySQL binaries* are affected!
[28 Dec 2008 20:53] MySQL Verification Team
hmm Sorry I didn't noticed my result is different than the expected one.
[28 Dec 2008 20:57] Peter Laursen
6.0.8 is also affected!
[28 Dec 2008 20:57] MySQL Verification Team
@Peter,
See my prior comment.
[28 Dec 2008 20:58] Peter Laursen
@miguel .. I saw now!
[28 Dec 2008 21:28] MySQL Verification Team
@Peter,
Which OS's version have you got the expected result with 4.1.22?:

mysql> SELECT name, MAX(score)
    -> FROM highscores
    -> GROUP BY name
    -> ORDER BY score DESC, createdat ASC
    -> ;
+----------+------------+
| name     | MAX(score) |
+----------+------------+
| Player A |        120 |
| Player B |        120 |
+----------+------------+
2 rows in set (0.06 sec)

mysql> show variables like "%version%";
+-------------------------+-------------------------------+
| Variable_name           | Value                         |
+-------------------------+-------------------------------+
| protocol_version        | 10                            |
| version                 | 4.1.22-community-nt           |
| version_comment         | MySQL Community Edition (GPL) |
| version_compile_machine | ia32                          |
| version_compile_os      | Win32                         |
+-------------------------+-------------------------------+
5 rows in set (0.00 sec)
[28 Dec 2008 22:13] Peter Laursen
ooops ... my mistake!

CREATE TABLE `highscores` (
  `name` char(30) NOT NULL,
  `score` int(8) NOT NULL,
  `createdat` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into `highscores` values('Player A','120','2008-12-28 19:13:36'),
 ('Player B','90','2008-12-28 19:10:57'),
 ('Player A','70','2008-12-28 19:11:09'),
 ('Player B','120','2008-12-28 19:10:21');

SELECT name, MAX(score), createdat
FROM highscores
GROUP BY name
ORDER BY  score DESC, createdat ASC;

name      MAX(score)          createdat  
--------  ----------  -------------------
Player A         120  2008-12-28 19:13:36
Player B         120  2008-12-28 19:10:57

show variables like '%version%';

Variable_name            Value                        
-----------------------  -----------------------------
protocol_version         10                           
version                  4.1.22-community-nt          
version_comment          MySQL Community Edition (GPL)
version_compile_machine  ia32                         
version_compile_os       Win32

(I was executing ".. ORDER BY createdat ASC" on 4.1.22 only in the first place).

Also I notice that this is not related to TIMESTAMP type.  Creating the column as varchar(50) makes no difference. And even

CREATE TABLE `highscores3` (
  `name` char(30) NOT NULL,
  `score` int(8) NOT NULL,
  `createdat` INT
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into `highscores3` values('Player A',120,191336),
 ('Player B','90',191057),
 ('Player A','70',191109),
 ('Player B','120',191021);
..etc

doesn't either.

This is also interesting

SELECT name, score, createdat
FROM highscores
GROUP BY name
ORDER BY score DESC, createdat ASC;

name       score          createdat  
--------  ------  -------------------
Player A     120  2008-12-28 19:13:36
Player B      90  2008-12-28 19:10:57

.. what seems to indicate that ORDER BY in the original statement is done on non-aggregated data. So maybe the query should be written like

SELECT name, max(score)
FROM highscores
GROUP BY name
ORDER BY max(score) DESC, createdat ASC;

/*
name      max(score)
--------  ----------
Player B         120
Player A         120 */

.. I now tend to think that this is *not a bug*
[29 Dec 2008 17:07] Sveta Smirnova
Peter,

thank you for the feedback. We tend to agree with you.

Regarding to our user manual: http://dev.mysql.com/doc/refman/5.0/en/select.html:

MySQL resolves unqualified column or alias references in ORDER BY clauses by searching in the select_expr values, then in the columns of the tables in the FROM clause. For GROUP BY or HAVING clauses, it searches the FROM clause before searching in the select_expr values. (For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 behavior that used the same rules as for ORDER BY.) 

Additionally using only part of requested columns in GROUP BY expression is MySQL extension which should be used with care. Please see http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html:

MySQL extends the use of GROUP BY so that you can use non-aggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping.
...
Do not use this feature if the columns you omit from the GROUP BY part are not constant in the group. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.
[30 Dec 2008 14:09] Who Cares
Hello
First of all thank you for taking it serious. I would like to agree with you and maybe I dont understand the lastes key point. I also removed my alias in the SQL that wasn't the solution. I tried your latest (working) SQL:

SELECT name, max(score)
FROM highscores
GROUP BY name
ORDER BY max(score) DESC, createdat ASC;

BEFORE I started it I prepared my test examples concerning the relevant records (120 score's record):

name          score        createdat
Player A     120           2004-12-28 19:13:38
Player B      120          2005-12-28 19:13:39

After starting your SQL (mentioned above) I got the following output:

name        score
Player A    120
Player B     120

Fine, fine, correct order.
Now I have to possibilities to change the createdat order, either on Player A or on player B. Changing it on Player A (higher or lower than Player B's createdat) will always give me the correct order after executing your SQL (so did mine originally).
Before changing A's createdat I change the year of Player B's createdat to 2003 -> 2003-12-28 19:13:38
The player B's createdat should now be older than Player A's createdat. Executing your SQL shows me:
name        score
Player A    120
Player B     120

Thats not right! 
I just changed now player A's year to 2005 and the order doesnt change either. Its really weird. 

Not to say Im good or Oracle is better but Im working since years with Oracle and this situation works fine on Oracle side. 

If you can reproduce my example from above then create an additional field called f.e. counter (INT 2). Insert a 1 and 2 into the columns on the A and B records with 120 score. Change the SQL from above:  ... ORDER BY max(score) DESC, counter ASC;
You will see it works flawless, however you set the counter values or change the order by to counter DESC.
Im still sure MySQL, as much as I appreciate it, has a problem doing an order by on timestamps withing a group by, while other datatypes work fine with it.
Greetings
Farai
[30 Dec 2008 14:28] Who Cares
Sorry, I have to correct myself. The example with "counter" doesn't work either properly, the same way wrong as the intial timestamp sample.
I don't understand then why not in SELECT displayed columns but in the SQL used columns are not taken care of. Im also not forced to display (SELECT) a column that is part of the WHERE clause to make it work fine.
Greetings
Farai
[30 Dec 2008 14:39] Peter Laursen
just an advice from another user: you will need to change the status from 'not a bug' to 'open' if you shall be sure that your comments get attention! (you do from 'edit submission' tab)

Are you telling that this query

SELECT name, max(score)
FROM highscores
GROUP BY name
ORDER BY max(score) DESC, createdat ASC;

does not work either?

If you only "ORDER BY score" the server will simply pick and return the first non-aggreated value for score it finds in each GROUP.  And it may not be the max. value!
[30 Dec 2008 16:05] Sveta Smirnova
Who Cares,

thank you for the feedback.

But query like 

SELECT name, max(score)
FROM highscores
GROUP BY name
ORDER BY max(score) DESC, createdat ASC;

uses field createdat in ORDER BY and does not use same field in GROUP BY. This is specific MySQL extension of SQL standard.

Nor Oracle, neither PostgreSQL allows such queries at all. You will get syntax error. In case of MySQL you should use this option with care and be aware you'll get a field with random createdat with such SELECT.
[30 Dec 2008 17:38] Who Cares
Thanks again for your care. Well standards, I've seen so many in browsers and DB's. Everyhone has its own but thats not the point here. 
I tried it out in SQL (see below) and you are absolutely right. Guess I never used it in that context in Oracle. I have to use createdat in the GROUP BY. 

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.7.0 - Production
 
=> Creating table highscores

SQL> desc highscores
 Name                                      Null?    Type
 ----------------------------------------- -------- -----------------
 NAME                                                   VARCHAR2(30)
 SCORE                                                  NUMBER(5)
 CREATEDAT                                        DATE
 
=> Displaying all records

SQL> SELECT name, score, TO_CHAR(createdat, 'yyyy-mm-dd HH24:mi:ss')
  2  FROM highscores;
 
NAME                                SCORE TO_CHAR(CREATEDAT,'
------------------------------ ---------- -------------------
Player A                              120 2008-12-30 18:08:52
Player B                               80 2008-12-30 18:02:52
Player A                               70 2008-12-30 18:04:52
Player B                              120 2008-12-30 18:03:52

=> Executing the SQL

SQL> SELECT NAME, MAX(score)
  2  FROM highscores
  3  GROUP BY NAME, createdat
  4  ORDER BY MAX(score) DESC, createdat ASC;
 
NAME                           MAX(SCORE)
------------------------------ ----------
Player B                              120
Player A                              120
Player B                               80
Player A                               70

=> Changing the minutes of createdat of player B (in the background), so its afer than A and show again

SQL> SELECT name, score, TO_CHAR(createdat, 'yyyy-mm-dd HH24:mi:ss')
  2  FROM highscores;
 
NAME                                SCORE TO_CHAR(CREATEDAT,'
------------------------------ ---------- -------------------
Player A                              120 2008-12-30 18:08:52
Player B                               80 2008-12-30 18:02:52
Player A                               70 2008-12-30 18:04:52
Player B                              120 2008-12-30 18:10:52

=> Executing the SQL again

SQL> SELECT NAME, MAX(score)
  2  FROM highscores
  3  GROUP BY NAME, createdat
  4  ORDER BY MAX(score) DESC, createdat ASC;
 
NAME                           MAX(SCORE)
------------------------------ ----------
Player A                              120
Player B                              120
Player B                               80
Player A                               70

Conclusion: When I use createdat in the GROUP BY inside MySQL it also works perfectly. Well in both cases, MySQL and Oracle, more records than wanted appears, because of the createdat in the GROUP BY, what is obvious and understandabe.  Its another thing to ged rid of them (Aggregating Functions in Oracle) but all over it works!
The big question now is: Why does MySQL offer what I did accidentally if it can run into wrong results? Well I know now and will take care of. My Inline view works fine with it.
You can close this task now forever. I was not aware of reopening it as someone suggested (tnx to let me know).

Thank you all for your quick and professional help and I hope I didn't waste your time too much. I was and I am still an fan of MySQL (has so many great features I miss in Oracle).
Greetings
Farai
[30 Dec 2008 17:59] Sveta Smirnova
Thank you for the feedback.

> The big question now is: Why does MySQL offer what I did accidentally if it can run into wrong results?

For performance. I believe I posted this link already: http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-fields.html

"MySQL extends the use of GROUP BY so that you can use non-aggregated columns or
calculations in the SELECT list that do not appear in the GROUP BY clause. You can use
this feature to get better performance by avoiding unnecessary column sorting and
grouping."