MySQL Bugs: #79100: MySQL Manual does not explain when ORDER BY NULL does NOT prevent filesort

Bug #79100	MySQL Manual does not explain when ORDER BY NULL does NOT prevent filesort
Submitted:	3 Nov 2015 17:01	Modified:	17 Nov 2015 18:52
Reporter:	Valeriy Kravchuk	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Documentation	Severity:	S3 (Non-critical)
Version:	any, 5.5, 5.6, 5.7	OS:	Any
Assigned to:	Paul DuBois	CPU Architecture:	Any
Tags:	distinct, missing manual, order by null

Description:
MySQL manual (http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html) says:

"If a query includes GROUP BY but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying ORDER BY NULL."

One may conclude as a result that there must be no "Using filesort" step inthe EXPLAIN output of any query with GROUP BY and ORDER BY NULL. This is NOT the case:

mysql> explain select id, count(distinct c1) from tg1 group by id order by null;

+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
|  1 | SIMPLE      | tg1   | ALL  | NULL          | NULL | NULL    | NULL |    4
 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+------+----------------+
1 row in set (0.00 sec)

If there is a group function over argument with DISTINCT, sorting is still needed.

How to repeat:
Use thew following primitive test:

mysql> create table tg1 (id int, c1 int, c2 int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tg1 values(1,2,3),(1,2,4),(1,2,5),(2,2,2);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> explain select id, count(c1) from tg1 group by id;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra                           |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
|  1 | SIMPLE      | tg1   | ALL  | NULL          | NULL | NULL    | NULL |    4
 | Using temporary; Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+---------------------------------+
1 row in set (0.00 sec)

OK, we have filesort, let's try to avoid it based on the manual:

mysql> explain select id, count(c1) from tg1 group by id order by null;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
|  1 | SIMPLE      | tg1   | ALL  | NULL          | NULL | NULL    | NULL |    4
 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-----------------+
1 row in set (0.00 sec)

It works as documented, but try this:

mysql> explain select id, count(distinct c1) from tg1 group by id order by null;

+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra          |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
|  1 | SIMPLE      | tg1   | ALL  | NULL          | NULL | NULL    | NULL |    4
 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+----------------+
1 row in set (0.00 sec)

More complex cases may end up with plans like these:

mysql> explain select t1.id, t2.id, count(distinct t1.c1) from tg1 t1 join tg1 t
2 group by t1.id, t2.id order by null\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4
        Extra: Using join buffer (Block Nested Loop)
2 rows in set (0.00 sec)

Suggested fix:
Make sure manual clearly explains that there are cases when extra filesort step is still needed, even with ORDER BY NULL added explicitly.

At the optimizer code level, please, consider adding a comment to the plan explaining that filesort is actually used to apply aggregation function only to distinct arguments.

Hello Valeriy,

Thank you for the report.

Thanks,
Umesh

Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly.

Note added per discussion with optimizer team:

The optimizer may still choose to use sorting to implement grouping
operations. ORDER BY NULL suppresses sorting of the result, not prior
sorting done by grouping operations to determine the result.

(Overhead of "sorting the result" differs from using sorting to come up with the result.)