Bug #25658 Group by Performance is slow
Submitted: 16 Jan 2007 19:34 Modified: 28 Dec 2008 14:50
Reporter: Jim Tyrrell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.2.0 falcon-alpha and now 5.2.3 OS:Linux (Linux Fedora Core 6)
Assigned to: Sergey Petrunya CPU Architecture:Any
Tags: subquery benchmark

[16 Jan 2007 19:34] Jim Tyrrell
Description:
When querying a table with 10020 roles with 99 Groups of information and running a query like Select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID);

This query takes a long time like over 2 mins on a pretty powerful machine, even though each of the sub parts of the query take no time at all. I can also list in all 100 GroupIDs in the query in the in clause and this query returns in no time.

How to repeat:
When created a table with TestID (AutoID), TestGroupID, and Test Name.
TestGroupID has 1-99 ie 1 is repeated 100 times, and the 2 100 times until you get 99 versions of this.  Then run the query and see how slow it performs.

Suggested fix:
Please make the optimizer faster for this kind of query. Queries like this are killing me, and rewriting them w/ outer joins is silly since this all works in Oracle today.
[17 Jan 2007 0:28] MySQL Verification Team
Thank you for the bug report. Could you please provide a dump of the tables.
Thanks in advance.
[17 Jan 2007 13:22] Jim Tyrrell
This is a MySQL dump of my current database with Falcon

Attachment: Test.sql (text/x-sql), 203.84 KiB.

[17 Jan 2007 13:30] Jim Tyrrell
Select Max(TestID) from Test Group By TestGroupID;

This query runs in .03 secs.

This query is very fast, but the same one with 2 queries is very very slow as you can see further below:
Very fast!!!
mysql> Select * from Test where TestID in (9999, 9899, 9799);+--------+-------------+----------+
| TestID | TestGroupID | TestName |
+--------+-------------+----------+
|   9799 |          97 | Test9799 | 
|   9899 |          98 | Test9899 | 
|   9999 |          99 | Test9999 | 
+--------+-------------+----------+
3 rows in set (0.02 sec)

Very slow
 Select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID);
....
|   9699 |          96 | Test9699 | 
|   9799 |          97 | Test9799 | 
|   9899 |          98 | Test9899 | 
|   9999 |          99 | Test9999 | 
+--------+-------------+----------+
100 rows in set (1 min 45.17 sec)

Explain so you have it:
mysql> explain Select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID);
+----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | PRIMARY            | Test  | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using where                     | 
|  2 | DEPENDENT SUBQUERY | Test  | ALL  | NULL          | NULL | NULL    | NULL | 1000 | Using temporary; Using filesort | 
+----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+
2 rows in set (0.00 sec)

Note I do not have any indexs on this example, but that does not matter.

Thank You
Jim Tyrrell
[17 Jan 2007 13:46] Jim Tyrrell
I have added in indexing to all permutations of what we are querying on with no effect on the big query.  The individual queries now take almost no time at all.  Improving by .01 or ,02 ms.

Just so you can see these
alter table Test add index testgroupidIndex (TestGroupID);
alter table Test add index testidIndex (TestID);
alter table Test add index testgroupIndex (TestGroupID, TestID);
alter table Test add index testgroupIndex (TestID, TestGroupID);

mysql> show index from Test;
+-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Test  |          1 | testIndex        |            1 | TestID      | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| Test  |          1 | testIndex        |            2 | TestGroupID | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| Test  |          1 | testgroupIndex   |            1 | TestGroupID | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| Test  |          1 | testgroupIndex   |            2 | TestID      | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| Test  |          1 | testidIndex      |            1 | TestID      | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
| Test  |          1 | testgroupidIndex |            1 | TestGroupID | NULL      |        NULL |     NULL | NULL   | YES  | BTREE      |         | 
+-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
[13 Feb 2007 21:22] Jim Tyrrell
This is still broken for the most recent download of the 5.2.3 release running on the same box as before.

Please provide and eta and let me know when this might be fixed.
[22 Mar 2007 11:19] MySQL Verification Team
Thank you for the bug report.
[27 Mar 2007 6:26] Igor Babaev
This problem will be fixed after WL #1110, WL #2980 are done.
[28 Dec 2008 14:24] Sergey Petrunya
Re-trying on the latest 6.0:

mysql> explain select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10020
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: Test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10020
        Extra: Using temporary; Using filesort
2 rows in set (0.05 sec)

Query execution time is 0.10 sec.

EXPLAIN shows that the query is now executed with materialization strategy.

Re-running without 6.0 optimizations:
mysql> set @@optimizer_switch='no_semijoin,no_materialization';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: Test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10020
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: Test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10020
        Extra: Using temporary; Using filesort
2 rows in set (0.00 sec)

Query execution time is 5 min 41.37 sec.
[28 Dec 2008 14:49] Sergey Petrunya
MySQL 6.0 subquery optimizations only allow one kind of strategy - scan the
outer table and make lookups into the inner table. This is not a problem for
this query, as outer-to-inner order is best here, but could pose a problem
when an inner-to-outer order is desired.

Proof that outer-to-inner order is best for this query:

create temporary table tbl1 (a int primary key) engine=heap;
insert into tbl1 select Max(TestID) X from Test Group By TestGroupID;
select * from Test,tbl1 where Test.TestID=tbl1.a;

query times:  0.01 + 0.05 + 0.04 = 0.10 sec  (same as with materialization)

explain select * from Test,tbl1 where Test.TestID=tbl1.a;
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                  | rows  | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
|  1 | SIMPLE      | Test  | ALL    | NULL          | NULL    | NULL    | NULL                 | 10020 |       | 
|  1 | SIMPLE      | tbl1  | eq_ref | PRIMARY       | PRIMARY | 4       | bug25658.Test.TestID |     1 |       | 
+----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
[28 Dec 2008 14:50] Sergey Petrunya
Changing status to closed as this is fixed by WL#3985.