Bug #52029 GROUP BY results in much slower query
Submitted: 13 Mar 2010 12:26 Modified: 22 Mar 2010 19:42
Reporter: Greg Hazel Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.44 OS:Any
Assigned to: CPU Architecture:Any
Tags: GROUP BY

[13 Mar 2010 12:26] Greg Hazel
Description:
Adding a GROUP BY of a primary key results in a much slower query, for no obvious reason.

This is InnoDB. Using another unique, indexed column to GROUP BY run quickly as expected. I tried ORDER BY NULL and it did not help at all.

How to repeat:
mysql> SELECT SQL_NO_CACHE `users`.* FROM `users` INNER JOIN friendships ON (users.id = friendships.friend_id AND friendships.user_id IN (4)) OR (users.id = friendships.user_id AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL);
...
20 rows in set (0.06 sec)

mysql> SELECT SQL_NO_CACHE `users`.* FROM `users` INNER JOIN friendships ON (users.id = friendships.friend_id AND friendships.user_id IN (4)) OR (users.id = friendships.user_id AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL) GROUP BY users.id;
...
20 rows in set (1.00 sec)
[13 Mar 2010 12:28] Greg Hazel
Happens everywhere.
[13 Mar 2010 12:32] Greg Hazel
mysql> EXPLAIN SELECT SQL_NO_CACHE `users`.* FROM `users` INNER JOIN friendships ON (users.id = friendships.friend_id AND friendships.user_id IN (4)) OR (users.id = friendships.user_id AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL);
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+------------------------------------------------+
| id | select_type | table       | type  | possible_keys                                                                                                                                                                                      | key                                                        | key_len | ref  | rows   | Extra                                          |
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+------------------------------------------------+
|  1 | SIMPLE      | friendships | index | index_friendships_on_friend_id_and_user_id,index_friendships_on_user_id,index_friendships_on_friend_id,index_friendships_on_accepted_at,index_friendships_on_friend_id_and_user_id_and_accepted_at | index_friendships_on_friend_id_and_user_id_and_accepted_at | 17      | NULL | 118367 | Using where; Using index                       |
|  1 | SIMPLE      | users       | ALL   | PRIMARY                                                                                                                                                                                            | NULL                                                       | NULL    | NULL | 100542 | Range checked for each record (index map: 0x1) |
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+------------------------------------------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT SQL_NO_CACHE `users`.* FROM `users` INNER JOIN friendships ON (users.id = friendships.friend_id AND friendships.user_id IN (4)) OR (users.id = friendships.user_id AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL) GROUP BY users.id;
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+-----------------------------------------------------------+
| id | select_type | table       | type  | possible_keys                                                                                                                                                                                      | key                                                        | key_len | ref  | rows   | Extra                                                     |
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+-----------------------------------------------------------+
|  1 | SIMPLE      | friendships | index | index_friendships_on_friend_id_and_user_id,index_friendships_on_user_id,index_friendships_on_friend_id,index_friendships_on_accepted_at,index_friendships_on_friend_id_and_user_id_and_accepted_at | index_friendships_on_friend_id_and_user_id_and_accepted_at | 17      | NULL | 118367 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | users       | ALL   | PRIMARY                                                                                                                                                                                            | NULL                                                       | NULL    | NULL | 100542 | Using where; Using join buffer                            |
+----+-------------+-------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+---------+------+--------+-----------------------------------------------------------+
2 rows in set (0.00 sec)
[13 Mar 2010 13:57] Valeriy Kravchuk
Thank you for the problem report. Same request as for bug #52030: please, upload dump of the tables used or, at least, SHOW CREATE TABLE and SHOW TABLE STATUS results for them.
[13 Mar 2010 21:34] Greg Hazel
mysql> SHOW TABLE STATUS like 'users';
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
| Name  | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment               |
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
| users | InnoDB |      10 | Compact    | 105163 |            254 |    26804224 |               0 |     57917440 |         0 |         106768 | 2010-03-12 06:37:03 | NULL        | NULL       | utf8_general_ci |     NULL |                | InnoDB free: 64512 kB |
+-------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
1 row in set (0.65 sec)

mysql> SHOW TABLE STATUS like 'friendships';
+-------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
| Name        | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment               |
+-------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
| friendships | InnoDB |      10 | Compact    | 125359 |             62 |     7880704 |               0 |     30539776 |         0 |         170922 | 2010-03-12 06:08:43 | NULL        | NULL       | utf8_general_ci |     NULL |                | InnoDB free: 64512 kB |
+-------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+-----------------------+
1 row in set (0.22 sec)

Attaching the output of SHOW CREATE TABLE in a file since this ticket system says it's too big for a comment.
[13 Mar 2010 21:35] Greg Hazel
SHOW CREATE TABLE for users and friendships

Attachment: schema.txt (text/plain), 13.46 KiB.

[16 Mar 2010 4:54] Valeriy Kravchuk
And what are the results of:

SELECT count(*) 
FROM `users` INNER JOIN friendships ON (users.id =
friendships.friend_id AND friendships.user_id IN (4)) OR (users.id = friendships.user_id
AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL);

SELECT count(distinct id) FROM users;

You had added GROUP BY step and if data set for grouping is big that may add some time, why not?
[16 Mar 2010 9:01] Greg Hazel
mysql> SELECT count(*)
    -> FROM `users` INNER JOIN friendships ON (users.id =
    -> friendships.friend_id AND friendships.user_id IN (4)) OR (users.id =
    -> friendships.user_id
    -> AND friendships.friend_id IN (4)) WHERE (accepted_at IS NOT NULL);
+----------+
| count(*) |
+----------+
|       20 |
+----------+
1 row in set (0.69 sec)

mysql> SELECT count(distinct id) FROM users;
+--------------------+
| count(distinct id) |
+--------------------+
|             107001 |
+--------------------+
1 row in set (0.12 sec)

With users.id grouping, there are also 20 results. Is it dependent on the size of the users table? I would expect it to scale with the number of returned rows.
[16 Mar 2010 11:09] Greg Hazel
And again, using GROUP BY with a larger column, users.guid, which is also unique per users row causes the grouping to occur but not take any extra time.

So why would the primary key be special, and so much slower?
[17 Mar 2010 8:54] Sveta Smirnova
Thank you for the feedback.

Verified as described.

Workaround:

mysql> alter table users add column uid  int(11) NOT NULL;
Query OK, 521024 rows affected (4 min 51.61 sec)
Records: 521024  Duplicates: 0  Warnings: 0

mysql> update users set uid=id;
Query OK, 521024 rows affected (10.21 sec)
Rows matched: 521024  Changed: 521024  Warnings: 0

mysql> alter table users add unique(uid);
Query OK, 521024 rows affected (5 min 2.36 sec)
Records: 521024  Duplicates: 0  Warnings: 0
[22 Mar 2010 17:22] MySQL Verification Team
Sveta,

Can you check whether a patch to bug #50843 fixes this problem too ??

Thanks in advance.
[22 Mar 2010 19:42] Sveta Smirnova
Sinisa,

I tested this bug after March, 8 when fix of bug #50843 was pushed into 5.1 tree. So the patch doesn't fix this bug.