Description:
MySQL manual (http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html) says:
"If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL."
One may conclude as a result that there must be no "Using filesort" step inthe EXPLAIN output of any query with GROUP BY and ORDER BY NULL. This is NOT the case:
mysql> explain select id, count(distinct c1) from tg1 group by id order by null;
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | tg1 | ALL | NULL | NULL | NULL | NULL | 4
| Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)
If there is a group function over argument with DISTINCT, sorting is still needed.
How to repeat:
Use thew following primitive test:
mysql> create table tg1 (id int, c1 int, c2 int);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into tg1 values(1,2,3),(1,2,4),(1,2,5),(2,2,2);
Query OK, 4 rows affected (0.02 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> explain select id, count(c1) from tg1 group by id;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
| 1 | SIMPLE | tg1 | ALL | NULL | NULL | NULL | NULL | 4
| Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
1 row in set (0.00 sec)
OK, we have filesort, let's try to avoid it based on the manual:
mysql> explain select id, count(c1) from tg1 group by id order by null;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
| 1 | SIMPLE | tg1 | ALL | NULL | NULL | NULL | NULL | 4
| Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
1 row in set (0.00 sec)
It works as documented, but try this:
mysql> explain select id, count(distinct c1) from tg1 group by id order by null;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
| 1 | SIMPLE | tg1 | ALL | NULL | NULL | NULL | NULL | 4
| Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
1 row in set (0.00 sec)
More complex cases may end up with plans like these:
mysql> explain select t1.id, t2.id, count(distinct t1.c1) from tg1 t1 join tg1 t
2 group by t1.id, t2.id order by null\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: Using join buffer (Block Nested Loop)
2 rows in set (0.00 sec)
Suggested fix:
Make sure manual clearly explains that there are cases when extra filesort step is still needed, even with ORDER BY NULL added explicitly.
At the optimizer code level, please, consider adding a comment to the plan explaining that filesort is actually used to apply aggregation function only to distinct arguments.