Bug #60735 Changed behavior in optimizing SQL using view from v5.0.27 to 5.1.41
Submitted: 2 Apr 2011 22:59 Modified: 3 Apr 2011 13:49
Reporter: Jacob Keane Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.41 OS:Linux (Ubuntu)
Assigned to: CPU Architecture:Any

[2 Apr 2011 22:59] Jacob Keane
Description:
In Version 5.0.27 the following feature used to work fine. I now upgraded to the latest Ubuntu stable release 5.1.41-3ubuntu12.10 and now it doesnt work. It results in INCORRECT results.
I have an ordering application and I have simplified the schema and created an example to help you reproduce this issue, which is very simple once simplified.
In the actual application I am using this logic in a more complicated manner so please dont ask - why do you want to do this.

I have a table that keeps a running list of order status (tablist) for each orderid and timestamps it. Later I want to see what the latest status of an order is.
So I create a view which sorts this table (tablist) by timestamp desc - simple and then groups by order id, so that I can get the latest status (once) for each order.
This simple thing doesnt work anymore, because in the optimizer somehow the use of the view to sort desc is just ignored (I think). In the older version there was a step to sort the table by timestamp desc and then the grouping would happen. In the new version, there is just one step (thru EXPLAIN)

How to repeat:
Create this one table and one view
CREATE TABLE `tablist` (  `orderid` INT(10) NULL,  `statusid` INT(10) NULL,  `ts` TIMESTAMP NULL ) COLLATE='latin1_swedish_ci' ENGINE=MyISAM ROW_FORMAT=DEFAULT;
CREATE ALGORITHM = UNDEFINED VIEW `viewlistdesc` AS SELECT * from tablist order by ts desc ;

INSERT INTO `tablist` (`orderid`, `statusid`, `ts`) VALUES (1, 1, '2011-04-01 00:00:00');
INSERT INTO `tablist` (`orderid`, `statusid`, `ts`) VALUES (1, 2, '2011-04-02 00:00:00');
INSERT INTO `tablist` (`orderid`, `statusid`, `ts`) VALUES (1, 3, '2011-04-03 00:00:00');
INSERT INTO `tablist` (`orderid`, `statusid`, `ts`) VALUES (2, 1, '2011-04-01 00:00:00');
INSERT INTO `tablist` (`orderid`, `statusid`, `ts`) VALUES (2, 2, '2011-04-02 00:00:00');

Now if you run this query - what do you think should happen (and used to happen in the previous version)
select * from viewlistdesc group by orderid;

You would think you would get the latest status based on ts, but guess what - you get the oldest status. i.e. the view definition is simply ignored by the optimizer.

I was expecting to get the statusid of 3 dated 2011/4/3 for orderid 1 and statusid of 2 dated 2011/4/2 for orderid =2
INSTEAD I get statusid of 1 dated 2011/4/1 for BOTH orders 1 and 2.

Simple to test

Suggested fix:
Go back to the way the optimizer was working in 5.0x
[3 Apr 2011 8:43] Valeriy Kravchuk
This is NOT a bug formally. MySQL server is free to use any value from column in the group if they are different. Read http://dev.mysql.com/doc/refman/5.1/en/group-by-hidden-columns.html:

"When using this feature, all rows in each group should have the same values for the columns that are ommitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same."
[3 Apr 2011 13:49] Jacob Keane
Thanks for the explanation.
Is there a way to pass some planner or optimizer type directive to guide the determinism towards a certain outcome?
I have a work around for this already though.
The query is (in case someone else needs a solution)

select * from tablist where ts=(select max(ts) from tablist as intab where tablist.orderid = intab.orderid) ;