Bug #49360 In some circumstances, GROUP BY WITH ROLLUP is faster then GROUP BY
Submitted: 2 Dec 2009 17:53 Modified: 3 Apr 2011 0:21
Reporter: Rene' Cannao' Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.66 , 5.0.84 , 5.0.84 , 5.1.41, 5.1.43 OS:Any
Assigned to: CPU Architecture:Any

[2 Dec 2009 17:53] Rene' Cannao'
Description:
In some circumstances, SELECT statement with GROUP BY takes twice longer than the same statement with WITH ROLLUP.

Probably, an additional sorting is performance.

How to repeat:
Create two tables with random data:

CREATE TABLE tbl1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, str1 VARCHAR(11) NOT NULL, str2 VARCHAR(11) NOT NULL, KEY (str1, str2)) ENGINE=MyISAM;
INSERT INTO tbl1(id) VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL);
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
INSERT INTO tbl1(id) SELECT NULL FROM tbl1;
UPDATE tbl1 SET str1=id%250000, str2=id%120000;

CREATE TABLE tbl2 (str1 VARCHAR(11) NOT NULL, str3 CHAR(2) NOT NULL, KEY (str1,str3));
INSERT INTO tbl2 SELECT str1, id%90 FROM tbl1 WHERE id%2;

-----

Compare the execution time of those two queries:

1) SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 WITH ROLLUP HAVING str2 IS NOT NULL AND str3 IS NOT NULL;

2) SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 HAVING str2 IS NOT NULL AND str3 IS NOT NULL;

Suggested fix:
Probably an optimization is present for GROUP BY ... WITH ROLLUP.
Implement the same for just GROUP BY.
[2 Dec 2009 17:57] Rene' Cannao'
Test case and some output

Attachment: test_case_49360.txt (text/plain), 8.44 KiB.

[3 Dec 2009 5:24] Valeriy Kravchuk
I can confirm these results with recent 5.1.42 from bzr. Also,  the speed of GROUP without ROLLUP is the same as of GROUP BY ... ORDER BY NULL (although filesort is NOT present in EXPLAIN results for it, unlike the plans for original two queries).
[4 Dec 2009 19:09] Valeriy Kravchuk
With 5.1.43-debug on Mac OS X I've got:

mysql> SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY
    -> tbl2.str3, tbl1.str2 WITH ROLLUP HAVING str2 IS NOT NULL AND str3 IS NOT NULL;

...
540000 rows in set (1 min 55.51 sec)

mysql> explain SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 WITH ROLLUP HAVING str2 IS NOT NULL AND str3 IS NOT NULL;
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref            | rows    | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
|  1 | SIMPLE      | tbl1  | index | str1          | str1 | 26      | NULL           | 1048576 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | tbl2  | ref   | str1          | str1 | 13      | test.tbl1.str1 |       4 | Using index                                  |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY    
    -> tbl2.str3, tbl1.str2 HAVING str2 IS NOT NULL AND str3 IS NOT NULL;

...
540000 rows in set (4 min 4.71 sec)

mysql> explain SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 HAVING str2 IS NOT NULL AND str3 IS NOT NULL;
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref            | rows    | Extra                                        |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
|  1 | SIMPLE      | tbl1  | index | str1          | str1 | 26      | NULL           | 1048576 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | tbl2  | ref   | str1          | str1 | 13      | test.tbl1.str1 |       4 | Using index                                  |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+----------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 HAVING str2 IS NOT NULL AND str3 IS NOT NULL ORDER BY NULL;

...
540000 rows in set (4 min 1.79 sec)

mysql> explain SELECT SQL_NO_CACHE COUNT(*) FROM tbl1 , tbl2 WHERE tbl1.str1 = tbl2.str1 GROUP BY tbl2.str3, tbl1.str2 HAVING str2 IS NOT NULL AND str3 IS NOT NULL ORDER BY NULL;
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref            | rows    | Extra                        |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+------------------------------+
|  1 | SIMPLE      | tbl1  | index | str1          | str1 | 26      | NULL           | 1048576 | Using index; Using temporary |
|  1 | SIMPLE      | tbl2  | ref   | str1          | str1 | 13      | test.tbl1.str1 |       4 | Using index                  |
+----+-------------+-------+-------+---------------+------+---------+----------------+---------+------------------------------+
2 rows in set (0.00 sec)
[4 Dec 2009 19:32] MySQL Verification Team
The mystery of faster ROLLUP is mostly resolved. The principal cause is that, when you have ROLLUP, temporary table is written sequentially, with no updates and without any searches of the grouped by column / expression values. The, all rows are read sequentially and grouped by on the fly.

Without ROLLUP, aggregate queries have to assert distinctiveness and have to work with all summing functions, including MIN / MAX. 

Sometimes first approach is faster and sometimes second is faster, but GROUP BY, without WITH ROLLUP, has to maintain distinct values and work with all functions. It would be possible to create a case where aggregate queries with ROLLUP would perform worse.
[18 Dec 2009 19:42] MySQL Verification Team
Some additional thoughts from me.

The algorithm applied in the ROLLUP variant would be MUCH slower if there is a larger number of the average number of rows with same values of grouped by columns.

Making optimization here seems to be very hard, as getting this average number would require the presence of one or several of suitable indices ... Even then, many other factors would have to be taken into account, like row width and maximum size for temporary MEMORY tables.
[4 Jan 2011 8:42] MySQL Verification Team
show profiles all output for both queries

Attachment: bug49360_5.5.9_show_profiles.txt (text/plain), 18.16 KiB.

[29 Jun 2011 16:12] MySQL Verification Team
the slower query spends all the time writing/updating records to disk. attached, cpu profiles.

Attachment: bug49360_5.5.13_cpu_hotspot_profile.txt (text/plain), 33.27 KiB.

[30 Oct 2011 5:59] MySQL Verification Team
google perf tools for group by.

Attachment: 5.5.17_group_by.pdf (application/pdf, text), 22.74 KiB.

[30 Oct 2011 5:59] MySQL Verification Team
google perf tools for group by with rollup.

Attachment: 5.5.17_group_by_with_rollup.pdf (application/pdf, text), 27.54 KiB.