Bug #25658 | Group by Performance is slow | ||
---|---|---|---|
Submitted: | 16 Jan 2007 19:34 | Modified: | 28 Dec 2008 14:50 |
Reporter: | Jim Tyrrell | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S1 (Critical) |
Version: | 5.2.0 falcon-alpha and now 5.2.3 | OS: | Linux (Linux Fedora Core 6) |
Assigned to: | Sergey Petrunya | CPU Architecture: | Any |
Tags: | subquery benchmark |
[16 Jan 2007 19:34]
Jim Tyrrell
[17 Jan 2007 0:28]
MySQL Verification Team
Thank you for the bug report. Could you please provide a dump of the tables. Thanks in advance.
[17 Jan 2007 13:22]
Jim Tyrrell
This is a MySQL dump of my current database with Falcon
Attachment: Test.sql (text/x-sql), 203.84 KiB.
[17 Jan 2007 13:30]
Jim Tyrrell
Select Max(TestID) from Test Group By TestGroupID; This query runs in .03 secs. This query is very fast, but the same one with 2 queries is very very slow as you can see further below: Very fast!!! mysql> Select * from Test where TestID in (9999, 9899, 9799);+--------+-------------+----------+ | TestID | TestGroupID | TestName | +--------+-------------+----------+ | 9799 | 97 | Test9799 | | 9899 | 98 | Test9899 | | 9999 | 99 | Test9999 | +--------+-------------+----------+ 3 rows in set (0.02 sec) Very slow Select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID); .... | 9699 | 96 | Test9699 | | 9799 | 97 | Test9799 | | 9899 | 98 | Test9899 | | 9999 | 99 | Test9999 | +--------+-------------+----------+ 100 rows in set (1 min 45.17 sec) Explain so you have it: mysql> explain Select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID); +----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+ | 1 | PRIMARY | Test | ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 2 | DEPENDENT SUBQUERY | Test | ALL | NULL | NULL | NULL | NULL | 1000 | Using temporary; Using filesort | +----+--------------------+-------+------+---------------+------+---------+------+------+---------------------------------+ 2 rows in set (0.00 sec) Note I do not have any indexs on this example, but that does not matter. Thank You Jim Tyrrell
[17 Jan 2007 13:46]
Jim Tyrrell
I have added in indexing to all permutations of what we are querying on with no effect on the big query. The individual queries now take almost no time at all. Improving by .01 or ,02 ms. Just so you can see these alter table Test add index testgroupidIndex (TestGroupID); alter table Test add index testidIndex (TestID); alter table Test add index testgroupIndex (TestGroupID, TestID); alter table Test add index testgroupIndex (TestID, TestGroupID); mysql> show index from Test; +-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Test | 1 | testIndex | 1 | TestID | NULL | NULL | NULL | NULL | YES | BTREE | | | Test | 1 | testIndex | 2 | TestGroupID | NULL | NULL | NULL | NULL | YES | BTREE | | | Test | 1 | testgroupIndex | 1 | TestGroupID | NULL | NULL | NULL | NULL | YES | BTREE | | | Test | 1 | testgroupIndex | 2 | TestID | NULL | NULL | NULL | NULL | YES | BTREE | | | Test | 1 | testidIndex | 1 | TestID | NULL | NULL | NULL | NULL | YES | BTREE | | | Test | 1 | testgroupidIndex | 1 | TestGroupID | NULL | NULL | NULL | NULL | YES | BTREE | | +-------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ 6 rows in set (0.00 sec)
[13 Feb 2007 21:22]
Jim Tyrrell
This is still broken for the most recent download of the 5.2.3 release running on the same box as before. Please provide and eta and let me know when this might be fixed.
[22 Mar 2007 11:19]
MySQL Verification Team
Thank you for the bug report.
[28 Dec 2008 14:24]
Sergey Petrunya
Re-trying on the latest 6.0: mysql> explain select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: Test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10020 Extra: Using where *************************** 2. row *************************** id: 2 select_type: SUBQUERY table: Test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10020 Extra: Using temporary; Using filesort 2 rows in set (0.05 sec) Query execution time is 0.10 sec. EXPLAIN shows that the query is now executed with materialization strategy. Re-running without 6.0 optimizations: mysql> set @@optimizer_switch='no_semijoin,no_materialization'; Query OK, 0 rows affected (0.00 sec) mysql> explain select * from Test where TestID in (Select Max(TestID) from Test Group By TestGroupID)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: Test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10020 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: Test type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10020 Extra: Using temporary; Using filesort 2 rows in set (0.00 sec) Query execution time is 5 min 41.37 sec.
[28 Dec 2008 14:49]
Sergey Petrunya
MySQL 6.0 subquery optimizations only allow one kind of strategy - scan the outer table and make lookups into the inner table. This is not a problem for this query, as outer-to-inner order is best here, but could pose a problem when an inner-to-outer order is desired. Proof that outer-to-inner order is best for this query: create temporary table tbl1 (a int primary key) engine=heap; insert into tbl1 select Max(TestID) X from Test Group By TestGroupID; select * from Test,tbl1 where Test.TestID=tbl1.a; query times: 0.01 + 0.05 + 0.04 = 0.10 sec (same as with materialization) explain select * from Test,tbl1 where Test.TestID=tbl1.a; +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+ | 1 | SIMPLE | Test | ALL | NULL | NULL | NULL | NULL | 10020 | | | 1 | SIMPLE | tbl1 | eq_ref | PRIMARY | PRIMARY | 4 | bug25658.Test.TestID | 1 | | +----+-------------+-------+--------+---------------+---------+---------+----------------------+-------+-------+
[28 Dec 2008 14:50]
Sergey Petrunya
Changing status to closed as this is fixed by WL#3985.