MySQL Bugs: #10722: SELECT .. MAX(t1.b) .. GROUP BY t2.a .. ORDER BY t1.b should not use filesort

Bug #10722	SELECT .. MAX(t1.b) .. GROUP BY t2.a .. ORDER BY t1.b should not use filesort
Submitted:	18 May 2005 22:05	Modified:	19 Feb 2007 23:07
Reporter:	jocelyn fournier (Silver Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	4.1.11	OS:	Linux (linux)
Assigned to:		CPU Architecture:	Any

Description:
Hi,

Let's create two tables :

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a char(2), b int(2), PRIMARY KEY (a,b));
INSERT INTO t1 VALUES ('a',1),('a',2),('a',3);
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (a int(2), b int(2), PRIMARY KEY (b));
INSERT INTO t2 VALUES (1,1),(1,2),(2,3);

Now I want to retrieve all the t1.b field where t1.a matches 'a', and the t2.a associated.

SELECT t2.a,t1.b FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' ORDER BY t1.b DESC;
+------+---+
| a    | b |
+------+---+
|    2 | 3 |
|    1 | 2 |
|    1 | 1 |
+------+---+

Since t1 has an index on (a,b), the optimizer only needs the index file of t1, and no filesorting is needed since it retrieves the rows in the reverse order of the index.

EXPLAIN confirms this fact :

EXPLAIN SELECT t2.a,t1.b FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' ORDER BY t1.b DESC;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                    |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref    | PRIMARY       | PRIMARY |       2 | const     |    2 | Using where; Using index |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY |       4 | test.t1.b |    1 |                          |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+--------------------------+
2 rows in set (0.00 sec)

Let's say now I still want the second column in the reverse index order, but I don't want any duplicate in the first column of this query.

SELECT t2.a,MAX(t1.b) FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' GROUP BY t2.a ORDER BY t1.b DESC;
+------+-----------+
| a    | MAX(t1.b) |
+------+-----------+
|    2 |         3 |
|    1 |         2 |
+------+-----------+
2 rows in set (0.00 sec)

this returns what I want, and shouldn't use any file sorting since results in the second column are still returned in the reverse index order. Unfortunately in this case, the GROUP BY order seems to take precedence over the ORDER BY internally and thus a filesort is needed. (I assume MySQL fetches all the rows, then do the GROUP BY in a temp table, and at the end do the ORDER BY using filesort).

EXPLAIN SELECT t2.a,MAX(t1.b) FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' GROUP BY t2.a ORDER BY t1.b DESC;
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-----------------------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref       | rows | Extra                                                     |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | t1    | ref    | PRIMARY       | PRIMARY |       2 | const     |    2 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | t2    | eq_ref | PRIMARY       | PRIMARY |       4 | test.t1.b |    1 |                                                           |
+----+-------------+-------+--------+---------------+---------+---------+-----------+------+-----------------------------------------------------------+
2 rows in set (0.01 sec)

Is it possible to have a little bit smarter behaviour of MySQL in such case, and detect since we want MAX(t1.b) ... ORDER BY t1.b DESC, no filesorting is needed ? (and same thing for MIN(t1.b) ... ORDER BY t1.b ASC) (I assume it would speed up things a lot, especially when combined with a LIMIT clause).

Thanks !
  Jocelyn

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a char(2), b int(2), PRIMARY KEY (a,b));
INSERT INTO t1 VALUES ('a',1),('a',2),('a',3);
DROP TABLE IF EXISTS t2;
CREATE TABLE t2 (a int(2), b int(2), PRIMARY KEY (b));
INSERT INTO t2 VALUES (1,1),(1,2),(2,3);
SELECT t2.a,t1.b FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' ORDER BY t1.b DESC;

EXPLAIN SELECT t2.a,t1.b FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' ORDER BY t1.b DESC;

SELECT t2.a,MAX(t1.b) FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' GROUP BY t2.a ORDER BY t1.b DESC;

EXPLAIN SELECT t2.a,MAX(t1.b) FROM t2,t1 WHERE t2.b=t1.b AND t1.a='a' GROUP BY t2.a ORDER BY t1.b DESC;

see http://dev.mysql.com/doc/mysql/en/group-by-hidden-fields.html

Hi !

Why is this related to group by hidden fields ?

Thanks,
  Jocelyn

Sorry, but you do not have any index on t2.a, and you have GROUP BY t2.a in your query. I do not see any way to avoid temporary table (and filesort) in this case. Please, check your test case.

Hi Valeriy,

Indeed you're right, the optimisation is not possible here.

Thanks,
  Jocelyn