Bug #2739 Eliminate need for subqueries and temporary tables in JOINs
Submitted: 12 Feb 2004 3:46 Modified: 10 Dec 2005 17:37
Reporter: Todd Hendricks Email Updates:
Status: Won't fix Impact on me:
Category:MySQL Server Severity:S4 (Feature request)
Version: OS:
Assigned to: CPU Architecture:Any

[12 Feb 2004 3:46] Todd Hendricks
In cases of SELECTs with JOINs to obtain aggregate data from a JOINed table, enable the usage of ORDER BY following JOIN ON statements to assure that in addition to aggregate function (ie., COUNT(), SUM()), desired initial row from JOINed table can be returned without affecting aggregates--necessary when MIN() or MAX() or other functions will not work due to extra joins to bring in data from other tables that cannot be established using said functions.

How to repeat:
In an "updated status" context, given 3 tables:

table_a; containing:

update_id:  AUTO_INCREMENT/INT id of unique "update" event
update_category_id:  INT - "update" event type(-> table_b.update_category_id)
update_time:  DATETIME update occured
update_user_id:  INT - ID of user who performed the update(-> table_c.user_id)

table_b; containing:

update_category_id:  AUTO_INCREMENT/INT id of unique "update" cateogory type
update_category_name:  VARCHAR containing descriptor of category

table_c; containing:

user_id:  AUTO_INCREMENT/INT id of unique user
user_name:  VARCHAR containing user's name:

In the event that an aggregate of the number of updates per category from table A is needed from the joining of these three tables while being able to acquire the User's name and datetime of the latest update, a simple MAX() call on table_a.update_time will fetch the proper latest update time, however the user_id will still be returned as the first row encountered for each update_category_id.  ORDERing of table_a is necessary by update_time DESC prior to joining, requiring this query to succeed:

  SELECT update_id, update_category_id, update_time, update_user_id 
  FROM table_a 
  ORDER BY table_a.update_time DESC;

  SELECT COUNT(temp.update_category_id) as update_count, temp.update_time AS last_update, table_b.update_category_name, table_c.user_name 
  FROM table_b
  INNER JOIN temp ON temp.update_category_id=table_b.update_category_id
  INNER JOIN table_c ON table_c.user_id=temp.update_user_id
  GROUP BY temp.update_category_id
  ORDER BY table_b.update_category_name;

To return eg.,
Category 1:  68 Event(s), last updated by John Employee at 2004-02-12 17:38:02
Category 2:  24 Event(s), last updated by Jill Public at 2003-12-31 23:59:59

Suggested fix:
Proposed solution:  Enable ORDER BY directive following JOIN ON commands.  Ideally;

  SELECT COUNT(table_a.update_category_id) as update_count, table_a.update_time AS last_update, table_b.update_category_name, table_c.user_name 
  FROM table_b
  INNER JOIN table_a ON table_a.update_category_id=table_b.update_category_id ORDER BY table_a.update_time DESC
  INNER JOIN table_c ON table_c.user_id=table_a.update_user_id
  GROUP BY table_a.update_category_id
  ORDER BY table_b.update_category_name

To return same results, eliminating the need for a temporary table (pre-4.0) or subqueries (4.0+).
[12 Feb 2004 3:50] Todd Hendricks
Correction:  Output would be as follows (including category name as acquired from table_b)

Category 1 (Kernel Updates):  68 Event(s), last updated by John Employee at 2004-02-12 17:38:02
Category 2 (HAL Updates):  24 Event(s), last updated by Jill Public at 2003-12-31 23:59:59

Minor typo, but important to illustrate the necessity of the JOIN between table_a and table_b.
[10 Dec 2005 17:37] Valeriy Kravchuk
Thank you for the feature request. Sorry, but according to the SQL Standard ORDER BY is the only and last clause of any SELECT statement. So, I do not think the exact syntax you described will be ever implemented.

As for optimizing such a statements, MySQL 5 optimizer introduced a lot of new features to make them executing faster. Please, check.