Bug #71891 GROUP BY performance differs wildly for tautologically equal columns
Submitted: 28 Feb 2014 20:27 Modified: 5 Mar 2014 20:37
Reporter: Doug Cook Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.15 OS:Any
Assigned to: CPU Architecture:Any

[28 Feb 2014 20:27] Doug Cook
Description:
In investigating poor performance for a number of JOIN queries using GROUP BY, I discovered that swapping the GROUP BY column for another column which is tautologically equivalent can cause dramatic differences in performance. By "tautologically equivalent" I mean that the two columns are equal by inspection since they are defined to be equal in a JOIN criterion.

In some cases merely changing the column for an equivalent one results in a speed up of two orders of magnitude (a query taking ~160 seconds took 1.76 seconds).

Given that the columns in question are equivalent by inspection, the optimizer should be choosing the equivalent version which results in a  superior execution plan.

I'm attaching a simple example database that shows the issue. In this example, the fields GEN.type and T.generation are, by definition, equal for all rows of the JOIN. If I GROUP BY T.generation, I get a relatively slow query; if I GROUP BY GEN.type, the query is much faster.

Obviously, with less trivial examples the difference becomes much more dramatic.

The problem is reproducible and occurs for multiple different tables in our database, on both Mac OS X and Linux.

How to repeat:
mysql> SELECT sql_no_cache T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY T.generation;
+------------+--------+-----------------------+
| generation | name   | count(distinct T.tid) |
+------------+--------+-----------------------+
| 1          | Name 1 |                 29159 |
| 2          | Name 2 |                  7620 |
| 3          | Name 3 |                    95 |
| BT         | Name 4 |                  1248 |
| CM         | Name 5 |                    16 |
| OR         | Name 6 |                   166 |
+------------+--------+-----------------------+
6 rows in set (1.36 sec)

mysql> SELECT sql_no_cache T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY GEN.type;
+------------+--------+-----------------------+
| generation | name   | count(distinct T.tid) |
+------------+--------+-----------------------+
| 1          | Name 1 |                 29159 |
| 2          | Name 2 |                  7620 |
| 3          | Name 3 |                    95 |
| BT         | Name 4 |                  1248 |
| CM         | Name 5 |                    16 |
| OR         | Name 6 |                   166 |
+------------+--------+-----------------------+
6 rows in set (0.13 sec)

mysql> EXPLAIN EXTENDED SELECT T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY T.generation;
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref             | rows | filtered | Extra                           |
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+---------------------------------+
|  1 | SIMPLE      | GEN   | ALL  | type          | NULL | NULL    | NULL            |    6 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | T     | ref  | gen           | gen  | 8       | bugrep.GEN.type | 6384 |   100.00 | NULL                            |
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+---------------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> EXPLAIN EXTENDED SELECT T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY GEN.type;
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref             | rows | filtered | Extra          |
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+----------------+
|  1 | SIMPLE      | GEN   | ALL  | type          | NULL | NULL    | NULL            |    6 |   100.00 | Using filesort |
|  1 | SIMPLE      | T     | ref  | gen           | gen  | 8       | bugrep.GEN.type | 6384 |   100.00 | NULL           |
+----+-------------+-------+------+---------------+------+---------+-----------------+------+----------+----------------+
2 rows in set, 1 warning (0.00 sec)
[28 Feb 2014 20:29] Doug Cook
database dump for example SQL

Attachment: bugrep.sql (application/octet-stream, text), 443.95 KiB.

[5 Mar 2014 20:37] MySQL Verification Team
Thank you for the bug report.

C:\dbs>c:\dbs\5.6\bin\mysql -uroot --port=3560 --prompt="mysql 5.6 > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 20
Server version: 5.6.17 Source distribution

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.6 > use x
Database changed
mysql 5.6 > set sql_mode = ONLY_FULL_GROUP_BY;
Query OK, 0 rows affected (0.00 sec)

mysql 5.6 > SELECT sql_no_cache T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY T.generation;
ERROR 1055 (42000): 'x.GEN.name' isn't in GROUP BY
mysql 5.6 > SELECT sql_no_cache T.generation, GEN.name, count(distinct T.tid) from demo_tag AS T JOIN demo_generation AS GEN ON T.generation = GEN.type GROUP BY GEN.type;
ERROR 1055 (42000): 'x.T.generation' isn't in GROUP BY
mysql 5.6 >

http://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html

"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns..... this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. ......"