Bug #23370 Bad nested group by performance
Submitted: 17 Oct 2006 13:32 Modified: 20 Oct 2006 9:11
Reporter: Jim Tyrrell Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:mysql-5.1.11-beta-win32 and 5.1.26 OS:Linux (Linux and Win)
Assigned to: CPU Architecture:Any

[17 Oct 2006 13:32] Jim Tyrrell
Description:
Nested Group by queries are very slow, and I can not afford to rewrite all of my existing applications in MS SQL and Oracle, since this group by performance is so slow.

How to repeat:
When I have a table like this:
mysql> describe test;
+-------------+-------------+------+-----+---------+----------------+
| Field       | Type        | Null | Key | Default | Extra          |
+-------------+-------------+------+-----+---------+----------------+
| TestID      | int(11)     | NO   | PRI | NULL    | auto_increment |
| TestGroupID | int(11)     | NO   |     |         |                |
| TestName    | varchar(45) | NO   |     |         |                |
+-------------+-------------+------+-----+---------+----------------+
3 rows in set (0.08 sec)

Where TestID is an autonumber fields and TestGroupID groups changes together nested group by queries take a long time on large sets.

If I have a set of data 5000 rows long with 100 groupids I get the following results:

mysql> select count(*) from Test;
+----------+
| count(*) |
+----------+
|     5100 |
+----------+

mysql> select TestGroupID from Test group by TestGroupID;
....100 rows of 1-100
100 rows in set (0.01 sec)

When I run this:
mysql> select max(TestID) from Test group by TestGroupID;
...100 rows of 5000 - 5099
|        5099 |
+-------------+
100 rows in set (0.01 sec)

When I run this:
mysql> select TestName from Test where TestID in (select max(TestID) from Test group by TestGroupID);
I get 100 rows but it takes 30 seconds!!!!!!

But when I run this:
mysql> select TestName from Test where TestID in (5000, 5001,5002);
+------------+
| TestName   |
+------------+
| Test: 4900 |
| Test: 4901 |
| Test: 4902 |
+------------+
3 rows in set (0.02 sec)
No matter how many numbers 5000, 50001,.... etc I put in there performance is great.  

Why does the nested group by query perform so badly? 

Most importnaly I have a large code base in MS SQL and Oracle that uses group by a lot.  I can not afford to rewrite all my queries in my application with derived queries to make them faster.  Please fix this ASAP.
[20 Oct 2006 9:11] Sveta Smirnova
Please do not submit the same bug more than once. An existing bug report already describes this very problem. Even if you feel that your issue is somewhat different, the resolution is likely
to be the same. Because of this, we hope you add your comments to the original bug instead.

Thank you for your interest in MySQL.

Duplicates bug #19895.