Bug #67899 Adding LIMIT to query increases time from 2 seconds to 42 minutes
Submitted: 13 Dec 2012 18:26 Modified: 24 Feb 2013 19:43
Reporter: Josh H Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.5.27 OS:Linux (CentOS 6.3)
Assigned to: CPU Architecture:Any
Tags: limit, performance, slow

[13 Dec 2012 18:26] Josh H
Description:
I have a query that takes 2 seconds to execute, but if I add a LIMIT clause it takes 42 minutes.  I expect that the limit query should take the same amount of time (or less) to return results as the unrestricted query.

This only happens for a number of dates near the start of data gathering, so it may be a corner case of some sort.  Any date from 2012-05-31 or later returns in several seconds, but 2012-05-30 or earlier takes 42 minutes (that time is consistent).

After 2012-05-31 the optimizer reverses the join order and the performance is good.  If I force the index for earlier dates it will reverse the join order and performance is expected.

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-30' group by pr.player_id limit 1;
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys       | key         | key_len | ref  | rows    | Extra                                        |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | pr    | range | PRIMARY,insert_date | insert_date | 3       | NULL | 1609109 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | dc    | ALL   | PRIMARY             | NULL        | NULL    | NULL |  169323 | Using where; Using join buffer               |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-31' group by pr.player_id limit 1;
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys       | key     | key_len | ref                            | rows   | Extra                           |
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+
|  1 | SIMPLE      | dc    | ALL  | PRIMARY             | NULL    | NULL    | NULL                           | 163859 | Using temporary; Using filesort |
|  1 | SIMPLE      | pr    | ref  | PRIMARY,insert_date | PRIMARY | 4       | test_worldoftanks.dc.player_id | 200152 | Using where; Using index        |
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+

player_record table:
CREATE TABLE `player_record` (
  `player_id` int(10) unsigned NOT NULL,
  `insert_date` date NOT NULL,
  `samples` mediumint(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`player_id`,`insert_date`),
  KEY `insert_date` (`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=4294967295 DATA DIRECTORY='/var/lib/mysql-bigdata/test/' INDEX DIRECTORY='/var/lib/mysql-bigdata/test/'

About 100k entries per day, with data starting 2012-05-14 so a 400k entry then.

How to repeat:
I have not tried to recreate this on another database/system, or randomly.  However I can recreate on my system and database.

drop table if exists date_curr;
create temporary table date_curr (player_id int unsigned not null, insert_date date, primary key player_id (player_id, insert_date)) engine=memory;
insert into date_curr select player_id, max(insert_date) as insert_date from player_record where insert_date between '2012-05-15' and '2012-05-15' + interval 6 day group by player_id;
drop table if exists date_prev;
create temporary table date_prev like date_curr;

mysql> select pr.player_id, max(pr.insert_date) as insert_date from player_record pr
inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-15'
group by pr.player_id;
+------------+-------------+
| 1002395119 | 2012-05-14  |
...
| 1002395157 | 2012-05-14  |
| 1002395187 | 2012-05-14  |
| 1002395475 | 2012-05-14  |
+------------+-------------+
105776 rows in set (2.19 sec)

mysql> select pr.player_id, max(pr.insert_date) as insert_date from player_record pr
inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-15' 
group by pr.player_id limit 1;
+------------+-------------+
| player_id  | insert_date |
+------------+-------------+
| 1000000080 | 2012-05-14  |
+------------+-------------+
1 row in set (42 min 23.26 sec)

Suggested fix:
I still don't know why the join order would cause a problem with LIMIT but not the entire result set.
[13 Dec 2012 18:30] Josh H
Adding the limit clause reverses the order of the join.

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-30' group by pr.player_id;
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys       | key     | key_len | ref                            | rows   | Extra                           |
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+
|  1 | SIMPLE      | dc    | ALL  | PRIMARY             | NULL    | NULL    | NULL                           | 169323 | Using temporary; Using filesort |
|  1 | SIMPLE      | pr    | ref  | PRIMARY,insert_date | PRIMARY | 4       | test_worldoftanks.dc.player_id | 200152 | Using where; Using index        |
+----+-------------+-------+------+---------------------+---------+---------+--------------------------------+--------+---------------------------------+
2 rows in set (0.00 sec)

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-05-30' group by pr.player_id limit 1;
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys       | key         | key_len | ref  | rows    | Extra                                        |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | pr    | range | PRIMARY,insert_date | insert_date | 3       | NULL | 1609109 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | dc    | ALL   | PRIMARY             | NULL        | NULL    | NULL |  169323 | Using where; Using join buffer               |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
[13 Dec 2012 18:35] Josh H
Changed to Optimizer problem and reduced the severity.
[2 Jan 2013 19:10] Shane Bester
http://forums.mysql.com/read.php?24,575953,575953#msg-575953
[2 Feb 2013 10:04] Shane Bester
attempt to create random data for testcase.

Attachment: bug67899_data_generate.sql (application/octet-stream, text), 1.70 KiB.

[2 Feb 2013 10:06] Shane Bester
Hi Josh,  please see the file I uploaded and check if the data it generates (on a test database!) is representative of yours.

Alternatively, is it possible to upload a mysqldump of your table? It should compress well.   Did you run ANALYZE TABLE and check if that helps ?
[14 Feb 2013 19:45] Josh H
I removed all but one data column to get the size down, file uploaded by FTP and confirmed that the problem shows up in that test (exact commands are in the README and match the bug description).

The sample isn't a good match because it has an entry for every ID on every date.  In my case there are many more IDs than are shown on a given date.
[21 Feb 2013 9:31] Shane Bester
Hi!  I imported the data you've uploaded.  Ran the queries....

mysql> explain  select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id w
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys       | key     | key_len | ref               | rows   | Extra                           |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
|  1 | SIMPLE      | dc    | ALL  | PRIMARY             | NULL    | NULL    | NULL              | 216825 | Using temporary; Using filesort |
|  1 | SIMPLE      | pr    | ref  | PRIMARY,insert_date | PRIMARY | 4       | test.dc.player_id | 274666 | Using where; Using index        |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
2 rows in set (0.00 sec)

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id wh
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type  | possible_keys       | key         | key_len | ref  | rows    | Extra                                        |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | pr    | range | PRIMARY,insert_date | insert_date | 3       | NULL | 1662712 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | dc    | ALL   | PRIMARY             | NULL        | NULL    | NULL |  216825 | Using where; Using join buffer               |
+----+-------------+-------+-------+---------------------+-------------+---------+------+---------+----------------------------------------------+
2 rows in set (0.00 sec)

mysql> analyze table player_record;
+--------------------+---------+----------+----------+
| Table              | Op      | Msg_type | Msg_text |
+--------------------+---------+----------+----------+
| test.player_record | analyze | status   | OK       |
+--------------------+---------+----------+----------+
1 row in set (4.45 sec)

mysql> explain  select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id w
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys       | key     | key_len | ref               | rows   | Extra                           |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
|  1 | SIMPLE      | dc    | ALL  | PRIMARY             | NULL    | NULL    | NULL              | 216825 | Using temporary; Using filesort |
|  1 | SIMPLE      | pr    | ref  | PRIMARY,insert_date | PRIMARY | 4       | test.dc.player_id |     19 | Using where; Using index        |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
2 rows in set (0.00 sec)

mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr inner join date_curr dc on pr.player_id = dc.player_id wh
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys       | key     | key_len | ref               | rows   | Extra                           |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
|  1 | SIMPLE      | dc    | ALL  | PRIMARY             | NULL    | NULL    | NULL              | 216825 | Using temporary; Using filesort |
|  1 | SIMPLE      | pr    | ref  | PRIMARY,insert_date | PRIMARY | 4       | test.dc.player_id |     19 | Using where; Using index        |
+----+-------------+-------+------+---------------------+---------+---------+-------------------+--------+---------------------------------+
2 rows in set (0.00 sec)

Please run ANALYZE TABLE and check if that helps, as it does clearly in my case.
[24 Feb 2013 19:43] Josh H
Works for me as well, thanks.