Bug #60447 why are aggregate MAX(...) results different when using different "ORDER BY" ?
Submitted: 13 Mar 2011 3:29 Modified: 17 Feb 2013 9:59
Reporter: Jason Vas Dias Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S4 (Feature request)
Version:5.1.56 OS:Linux (mysql-server-5.1.56-1.el5.art)
Assigned to: CPU Architecture:Any

[13 Mar 2011 3:29] Jason Vas Dias
Description:
I don't understand why changing "ASC" to "DESC" in these queries 
produces different results:

mysql> select id, MAX(`sent_datetime`) as sent from ( SELECT *            FROM `command_transaction`            WHERE                   (   (`command_transaction`.`esn` = "1431001062")           AND (`command_transaction`.`status` = "Confirm")           AND (`command_transaction`.`sms_command` = 'start' )                   )           ORDER BY `sent_datetime` DESC) as maxacts;
+-------+---------------------+
| id    | sent                |
+-------+---------------------+
| 15010 | 2011-03-12 19:41:19 |
+-------+---------------------+
1 row in set (0.01 sec)

mysql> select id, MAX(`sent_datetime`) as sent from ( SELECT *            FROM `command_transaction`            WHERE                   (   (`command_transaction`.`esn` = "1431001062")           AND (`command_transaction`.`status` = "Confirm")           AND (`command_transaction`.`sms_command` = 'start' )                   )           ORDER BY `sent_datetime` ASC) as maxacts;
+-------+---------------------+
| id    | sent                |
+-------+---------------------+
| 14848 | 2011-03-12 19:41:19 |
+-------+---------------------+
1 row in set (0.01 sec)

How to repeat:
mysql> create table `a` ( `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, `dt` DATETIME NOT NULL, `msg` VARCHAR(32) NOT NULL , unique key `id` (`id`) );             
Query OK, 0 rows affected (0.05 sec)                                                                                                                                

mysql> insert into `a` set `dt`=CAST(now() AS DATETIME);
Query OK, 1 row affected, 1 warning (0.08 sec)

mysql> select * from a
    -> ;
+----+---------------------+-----+
| id | dt                  | msg |
+----+---------------------+-----+
|  1 | 2011-03-13 03:18:57 |     |
+----+---------------------+-----+
1 row in set (0.00 sec)

mysql> update a set msg='msg1'
    -> ;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> insert into `a` set `dt`=CAST(now() AS DATETIME), `msg`='msg2';
Query OK, 1 row affected (0.00 sec)

mysql> select * from a
    -> ;
+----+---------------------+------+
| id | dt                  | msg  |
+----+---------------------+------+
|  1 | 2011-03-13 03:18:57 | msg1 |
|  2 | 2011-03-13 03:19:48 | msg2 |
+----+---------------------+------+
2 rows in set (0.00 sec)

mysql> select id, msg, max(dt) from ( select * from a order by dt desc ) as oa ;
+----+------+---------------------+
| id | msg  | max(dt)             |
+----+------+---------------------+
|  2 | msg2 | 2011-03-13 03:19:48 |
+----+------+---------------------+
1 row in set (0.07 sec)

mysql> select id, msg, max(dt) from ( select * from a order by dt asc ) as oa ;
+----+------+---------------------+
| id | msg  | max(dt)             |
+----+------+---------------------+
|  1 | msg1 | 2011-03-13 03:19:48 |
+----+------+---------------------+
1 row in set (0.00 sec)

mysql>

Suggested fix:
When asked to select other fields along with an aggregate result from a subquery,
the server should return values for those fields that match the values in the
record having the aggregate result(s) as its aggregate field value(s) ;
ie. by selecting "MAX(`dt`)" above from the sub-query, the row with the
"MAX(`dt`)" value for `dt` should be selected, and in the above example
I would expect the outer query's `id` result to ALWAYS be '2' and its
`msg` result to always be `2`, regardless of the sort order ( ASC / DESC ) -
but I do not . It looks like I always get the penultimate result record
returned for "ASC" queries (off-by-one) , not the ultimate result as with "DESC".
[13 Mar 2011 4:08] Jason Vas Dias
workaround :

mysql> select id, msg, dt from a where ( dt = ( select  max(dt) as dt from ( select * from a where msg="msg2" order by dt asc ) as al ) )  ;
+----+------+---------------------+
| id | msg  | dt                  |
+----+------+---------------------+
|  3 | msg2 | 2011-03-13 04:04:46 |
+----+------+---------------------+
1 row in set (0.00 sec)

mysql> select id, msg, dt from a where ( dt = ( select  max(dt) as dt from ( select * from a where msg="msg2" order by dt desc ) as al ) )  ;
+----+------+---------------------+
| id | msg  | dt                  |
+----+------+---------------------+
|  3 | msg2 | 2011-03-13 04:04:46 |
+----+------+---------------------+
1 row in set (0.00 sec)

I think this issue needs better documentation and a better workaround.
[13 Mar 2011 5:03] Jason Vas Dias
actually workaround should be :

mysql> select id, msg, dt from a where ( ( dt = ( select  max(dt) as dt from ( select * from a where msg="msg2" order by dt desc ) as al  ) ) and (msg="msg2"))  ;
+----+------+---------------------+
| id | msg  | dt                  |
+----+------+---------------------+
|  3 | msg2 | 2011-03-13 04:04:46 |
+----+------+---------------------+

ie. if you wanted to find the time and id of the latest "msg2" record , and
 not the id of the latest record.
 This is a nasty workaround ! Surely there must be a simpler way ?
[13 Mar 2011 10:48] Valeriy Kravchuk
In your queries:

select id, msg, max(dt) from ( select * from a order by dt desc ) as oa ;

you use id and msg scalar columns, as well as max(dt) aggregate column, but you do not use GROUP BY.  As a result, entire result set is considered one group, but in frames of this group id and message values are different. Thus MySQL server is free to use them from any row in the group, maybe the first one it gets from the derived table oa, and we see that order of rows in derived table oa matters...

This is documented at http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html:

"When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same."

This is not a bug formally.
[13 Mar 2011 20:23] Jason Vas Dias
Thanks to last commenter for the info - much appreciated ! 
I guess I agree that formally this is not a bug - or it is
one of those "bugs which are features" or vice versa - but
I really wish there was some log level setting that would
throw an exception / error if the server ever finds itself
having to return an "indeterminate result".
And if anyone else finds themselves having to determine
the latest record with a msg field of a given type, the
final workaround SQL is :
  select id, msg, max(dt) from a where ( msg = ?) group by(msg) ;
and without the "group by" the server finds itself returing an
indeterminate result - it would be really nice if it could let
us know that!
[13 Mar 2011 21:54] Jason Vas Dias
My FINAL fix for this issue :

DROP PROCEDURE IF EXISTS `LatestCommand` ;
DELIMITER |
CREATE PROCEDURE `LatestCommand`( IN Esn VARCHAR(10),
                                  IN Status ENUM("Pend","Sent","Done","Fail","Received","Confirm","Confirmed") ,
                                  IN Command VARCHAR(32)
                                )
BEGIN
   SELECT MAX(`id`) AS id, MAX(`sent_datetime`) AS sent_datetime ,
          `change_datetime`, `sms_command`, `status`, `esn` as esn,
      `lm_sequence_id`, `lmu_ip`, `lm_message_type`, `lm_message_data`,
      `reply`, `client_sms_message`
   FROM ( SELECT * FROM `command_transaction` WHERE
          (     IF((Esn != NULL),(`command_transaction`.`esn` = Esn),TRUE)
	  AND   IF((Status != NULL),(`command_transaction`.`status` = Status),TRUE)          
	  AND   IF((Command != NULL),(`command_transaction`.`sms_command` = Command),TRUE)          
          ) ORDER BY `sent_datetime` ASC
        ) AS maxcmd;
END
|
DELIMITER ;
[13 Mar 2011 22:19] Jason Vas Dias
My FINAL workaround / fix for this issue :

DROP PROCEDURE IF EXISTS `LatestCommand` ;
DELIMITER |
CREATE PROCEDURE `LatestCommand`( IN Esn VARCHAR(10),
                                  IN Status SET ("Pend","Sent","Done","Fail","Received","Confirm","Confirmed") ,
                                  IN Command SET ("start","locate","unlock","trunk","sunroof","freeze","wild")
                                )
BEGIN
   SELECT MAX(`id`) AS id, MAX(`sent_datetime`) AS sent_datetime ,
          `change_datetime`, `sms_command`, `status`, `esn` as esn,
      `lm_sequence_id`, `lmu_ip`, `lm_message_type`, `lm_message_data`,
      `reply`, `client_sms_message`
   FROM ( SELECT * FROM `command_transaction` WHERE
          (     IF((Esn != NULL),(`command_transaction`.`esn`=Esn),TRUE)
	  AND   IF((Status != NULL),(`command_transaction`.`status` IN ( Status )),TRUE)
	  AND   IF((Command != NULL),(`command_transaction`.`sms_command` IN ( Command )),TRUE)
          )  GROUP BY `command_transaction`.`esn`
        ) AS maxcmd ;
END
|
DELIMITER ;
[14 Mar 2011 8:11] Valeriy Kravchuk
What about ONLY_FULL_GROUP_BY SQL mode (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_only_full_group_by)?
[14 Apr 2011 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 May 2011 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[28 May 2011 13:34] Jason Vas Dias
Now I'm getting more weirdness even with "GROUP BY" :

I have these two procedures defined :

DROP PROCEDURE IF EXISTS `LatestCommand` ;
DELIMITER |
CREATE PROCEDURE `LatestCommand`
(  IN   Esn VARCHAR(10),
   IN   Status  ENUM('Pend','Sent','Done','Fail','Received','Confirm','Confirmed') ,
   IN   Command VARCHAR(32)
)
BEGIN
   SELECT `id`, MAX(`sent_datetime`) AS sent_datetime,
      `change_datetime`, `sms_command`, `status`, `esn` as esn,
      `lm_sequence_id`, `lmu_ip`, `reply`,  `lm_message_type`, `lm_message_data`,
      `client_sms_message`
   FROM `command_transaction`
   WHERE  (   (`command_transaction`.`esn`         = Esn)
          AND IF(Status IS NULL,TRUE,`command_transaction`.`status`       = Status)
          AND IF(Command IS NULL,TRUE,`command_transaction`.`sms_command` = Command)
          )  GROUP BY `command_transaction`.`esn`
             ORDER BY `command_transaction`.`sent_datetime` DESC
   ;
END
|
DELIMITER ;
DROP PROCEDURE IF EXISTS `LatestCommands` ;
DELIMITER |
CREATE PROCEDURE `LatestCommands`(IN   Esn VARCHAR(10),
                                  IN   Status  ENUM('Pend','Sent','Done','Fail','Received','Confirm','Confirmed') ,
                                  IN   Command VARCHAR(32)
                                )
BEGIN
   SELECT `id`, `sent_datetime`,
      `change_datetime`, `sms_command`, `status`, `esn` as esn,
      `lm_sequence_id`, `lmu_ip`, `reply`,  `lm_message_type`, `lm_message_data`,
      `client_sms_message`
   FROM `command_transaction`
   WHERE  (   (`command_transaction`.`esn`         = Esn)
          AND IF(Status IS NULL,TRUE,`command_transaction`.`status`       = Status)
          AND IF(Command IS NULL,TRUE,`command_transaction`.`sms_command` = Command)
          )   ORDER BY `sent_datetime` DESC
   ;
END
|
DELIMITER ;

And LatestCommands, WITH an ORDER BY, returns different results to LatestCommand, WITHOUT and ORDER BY :

$ echo 'call LatestCommand("1431001062",NULL,NULL);' | mysql --database XXXXXXXX
id      sent_datetime   change_datetime sms_command     `status`        esn     lm_sequence_id  lmu_ip  reply   lm_message_type lm_message_data client_sms_message
6       2011-05-28 02:49:45     2010-11-18 12:00:06     locate  NULL    1431001062      114     10.16.26.1:20510        0       LocateReport    830514310010620101020800724CE569A54CE569A51F148908C06B23E20000C3FA000000000030090002D0FFBD2F080E000000        GPS CURRENT STATIONARY\n Lat:52.1439496N\nLong:-106.6720286W\nhttps://www.text2car.com/@kPa\n

$ echo 'call LatestCommands("1431001062",NULL,NULL);' | mysql --database text2car | head -n 2
id      sent_datetime   change_datetime sms_command     `status`        esn     lm_sequence_id  lmu_ip  reply   lm_message_type lm_message_data client_sms_message
61605   2011-05-28 02:49:45     2011-05-27 21:49:45     EVENT:REBOOT    NULL    1431001062      1       10.16.26.98:20510       0       EventReport     830514310010620101010200014DE062A74DE0629F1F14B92AC077E7A10000000000000000000000260000FF8F00000E080A520000    REBOOT\nGPS CURRENT STATIONARY\n Lat:52.1451818N\nLong:-106.5883743W\nhttps://www.text2car.com/@mXc\n
[28 May 2011 13:44] Jason Vas Dias
So even when LatestCommand is defined as in previous comment, or like this :

--------------
DROP PROCEDURE IF EXISTS `LatestCommand`
--------------

--------------
CREATE PROCEDURE `LatestCommand`
(  IN   Esn VARCHAR(10),
   IN   Status  ENUM('Pend','Sent','Done','Fail','Received','Confirm','Confirmed') ,
   IN   Command VARCHAR(32)
)
BEGIN
   SELECT `id`, MAX(`sent_datetime`) AS sent_datetime,
      `change_datetime`, `sms_command`, `status`, `esn` as esn,
      `lm_sequence_id`, `lmu_ip`, `reply`,  `lm_message_type`, `lm_message_data`,
      `client_sms_message`
   FROM `command_transaction`
   WHERE  (   (`command_transaction`.`esn`         = Esn)
          AND IF(Status IS NULL,TRUE,`command_transaction`.`status`       = Status)
          AND IF(Command IS NULL,TRUE,`command_transaction`.`sms_command` = Command)
          )  GROUP BY `command_transaction`.`esn`
   ;
END
--------------

ie. with or without any ORDER BY , 
the query incorrectly and silently returns the maximum 'sent_datetime' value
overall records and any values ( I think from the FIRST record
returned by the query ) for all other fields .

How can I get it to return just the values from the record having 
`sent_datetime` equal to MAX(`sent_datetime`) ?
[28 May 2011 13:45] Jason Vas Dias
And adding 'HAVING `sent_datetime` = MAX(`sent_datetime`)'
to LatestCommand has no effect.
[28 May 2011 13:57] Jason Vas Dias
OK, I think I should raise another MAJOR bug about this - this query now enters
an infinite loop :

$ mysql --database text2car -v < routine.LatestCommand.sql
--------------
DROP PROCEDURE IF EXISTS `LatestCommand`
--------------

--------------
CREATE PROCEDURE `LatestCommand`
(  IN   Esn VARCHAR(10),
   IN   Status  ENUM('Pend','Sent','Done','Fail','Received','Confirm','Confirmed') ,
   IN   Command VARCHAR(32)
)
BEGIN
SELECT * FROM `command_transaction`
   WHERE ( ( `id`, `sent_datetime` ) IN
   (SELECT MAX(`id`) AS `id`, MAX(`sent_datetime`) AS sent_datetime
    FROM `command_transaction`
    WHERE  (   (`command_transaction`.`esn`         = Esn)
           AND IF(Status IS NULL,TRUE,`command_transaction`.`status`       = Status)
           AND IF(Command IS NULL,TRUE,`command_transaction`.`sms_command` = Command)
           )  GROUP BY `command_transaction`.`esn`
   ) );
END
--------------

[ root@:~/src/sql !]
$ echo 'call LatestCommand("1431001062",NULL,NULL);' | mysql --database text2car
Ctrl-C -- sending "KILL QUERY 29405" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100) at line 1: Query execution was interrupted
[28 May 2011 14:00] Valeriy Kravchuk
Please, send the results of EXPLAIN for the problematic query, with parameters replaced by their values from CALL statement.
[28 May 2011 14:33] Jason Vas Dias
thanks for responding, Valeriy -

Is this what you mean :

mysql> EXPLAIN SELECT * FROM `command_transaction`                                                     WHERE ( ( `id`, `sent_datetime` ) IN                                                 (SELECT MAX(`id`) AS `id`, MAX(`sent_datetime`) AS sent_datetime                      FROM `command_transaction`                                                           WHERE  (   (`command_transaction`.`esn`         = "1431001062") ) GROUP BY `command_transaction`.`esn`                                                 ORDER BY `command_transaction`.`sent_datetime` DESC ) );
--------------
EXPLAIN SELECT * FROM `command_transaction`                                                     WHERE ( ( `id`, `sent_datetime` ) IN                                                 (SELECT MAX(`id`) AS `id`, MAX(`sent_datetime`) AS sent_datetime                      FROM `command_transaction`                                                           WHERE  (   (`command_transaction`.`esn`         = "1431001062") ) GROUP BY `command_transaction`.`esn`                                                 ORDER BY `command_transaction`.`sent_datetime` DESC ) )
--------------

+----+--------------------+---------------------+------+---------------+------+---------+------+-------+-------------+
| id | select_type        | table               | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+--------------------+---------------------+------+---------------+------+---------+------+-------+-------------+
|  1 | PRIMARY            | command_transaction | ALL  | NULL          | NULL | NULL    | NULL | 26065 | Using where |
|  2 | DEPENDENT SUBQUERY | command_transaction | ALL  | NULL          | NULL | NULL    | NULL | 26065 | Using where |
+----+--------------------+---------------------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)

mysql>
[28 May 2011 14:36] Jason Vas Dias
this also enters infinite loop ( waited for 2mins, saw mysqld process growing,
pressed CTRL+C ):
mysql> SELECT * FROM `command_transaction`                                                     WHERE ( ( `id`, `sent_datetime` ) IN                                                 (SELECT MAX(`id`) AS `id`, MAX(`sent_datetime`) AS sent_datetime                      FROM `command_transaction`                                                           WHERE  (   (`command_transaction`.`esn`         = "1431001062") ) GROUP BY `command_transaction`.`esn`                                                 ORDER BY `command_transaction`.`sent_datetime` DESC ) );
--------------
SELECT * FROM `command_transaction`                                                     WHERE ( ( `id`, `sent_datetime` ) IN                                                 (SELECT MAX(`id`) AS `id`, MAX(`sent_datetime`) AS sent_datetime                      FROM `command_transaction`                                                           WHERE  (   (`command_transaction`.`esn`         = "1431001062") ) GROUP BY `command_transaction`.`esn`                                                 ORDER BY `command_transaction`.`sent_datetime` DESC ) )
--------------

Ctrl-C -- sending "KILL QUERY 29473" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted
mysql>
[28 May 2011 14:39] Valeriy Kravchuk
Based on the execution plan, your query had to scan entire table, and for each row selected it has to scan the entire table again. This gives:

mysql> select 26065*26065;
+-------------+
| 26065*26065 |
+-------------+
|   679384225 |
+-------------+
1 row in set (0.00 sec)

that is, 679+ millions of individual row reads... Surely it takes some time for server or process. Leave it working for a day :)

But where is the MySQL server bug here?
[28 May 2011 15:14] Jason Vas Dias
WORKAROUND FOUND :

$ mysql -v --database text2car < routine.LatestCommand.sql
--------------
DROP PROCEDURE IF EXISTS `LatestCommand`
--------------

--------------
CREATE PROCEDURE `LatestCommand`
(  IN   Esn VARCHAR(10),
   IN   Status  ENUM('Pend','Sent','Done','Fail','Received','Confirm','Confirmed') ,
   IN   Command VARCHAR(32)
)
BEGIN
SELECT * FROM `command_transaction`
   WHERE (  `id` =
   (SELECT MAX(`id`) AS `id`
    FROM `command_transaction`
    WHERE  (   (`command_transaction`.`esn`         = Esn)
           AND IF(Status IS NULL,TRUE,`command_transaction`.`status`       = Status)
           AND IF(Command IS NULL,TRUE,`command_transaction`.`sms_command` = Command)
           )  GROUP BY `command_transaction`.`esn`
              ORDER BY `command_transaction`.`sent_datetime` DESC
   ) );
END
--------------
$ echo 'call LatestCommand("1431001062",NULL,NULL);' | mysql --database text2car
id      esn     lm_sequence_id  status  sent_datetime   change_datetime reply   sms_command     source_msisdn   dest_msisdn     lmu_ip  lm_message_type lm_message_data       client_sms_message
61605   1431001062      1       Received        2011-05-28 02:49:45     2011-05-27 21:49:45     0       EVENT:REBOOT            13063719079     10.16.26.98:20510    EventReport      830514310010620101010200014DE062A74DE0629F1F14B92AC077E7A10000000000000000000000260000FF8F00000E080A520000      REBOOT\nGPS CURRENT STATIONARY\n Lat:52.1451818N\nLong:-106.5883743W\nhttps://www.text2car.com/@mXc\n
[28 May 2011 15:20] Jason Vas Dias
Why does it appear to be so impossible to achieve what I'm trying to do here in a single query without sub-queries ?
[28 May 2011 15:22] Jason Vas Dias
Why is the server allowing a query that it knows contains potentially random data
to proceed ?
[17 Feb 2013 9:59] MySQL Verification Team
You need to set sql_mode='ONLY_FULL_GROUP_BY' and an error will be thrown.
At least in 5.6, the installer(s) create a my.cnf containing this setting by default.