MySQL Bugs: #43401: Query spends too much time in ``sending data'' state

Bug #43401	Query spends too much time in ``sending data'' state
Submitted:	5 Mar 2009 8:00	Modified:	5 Mar 2009 12:37
Reporter:	Andrey Osenenko	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S3 (Non-critical)
Version:	5.1.30	OS:	FreeBSD
Assigned to:		CPU Architecture:	Any

Description:
In a myisam table with appr. 1M rows:

Query
select num,subnum from a where trip='!mOWjKAWAII' order by timestamp asc limit 10;
executes almost immediately.

But with a slight modification (asc changed to desc):
select num,subnum from a where trip='!mOWjKAWAII' order by timestamp desc limit 10;
It takes 10 seconds to execute. The problem is that query is spending whole 10 seconds in the ``sending data'' state:
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000089 |
| checking permissions |  0.000010 |
| Opening tables       |  0.000014 |
| System lock          |  0.000008 |
| Table lock           |  0.000011 |
| init                 |  0.000028 |
| optimizing           |  0.000011 |
| statistics           |  0.000338 |
| preparing            |  0.000019 |
| executing            |  0.000005 |
| Sorting result       |  0.000008 |
| Sending data         | 10.743997 |
| end                  |  0.000017 |
| query end            |  0.000005 |
| freeing items        |  0.000027 |
| logging slow query   |  0.000005 |
| logging slow query   |  0.000004 |
| cleaning up          |  0.000006 |
+----------------------+-----------+

I have not found anything about this in documentation, so I guess it is a bug.

I am running query from client bundled with mysql - /usr/bin/mysql.

Output of explain for both queries is same:
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key             | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+-------------+
|  1 | SIMPLE      | a     | index | trip_index    | timestamp_index | 5       | NULL | 3420 | Using where |
+----+-------------+-------+-------+---------------+-----------------+---------+------+------+-------------+

How to repeat:
Create this table:

CREATE TABLE `a_cut` (
  `num` int(10) unsigned NOT NULL,
  `subnum` int(10) unsigned NOT NULL,
  `timestamp` int(10) unsigned DEFAULT NULL,
  `trip` varchar(32) DEFAULT NULL,

  PRIMARY KEY (`num`,`subnum`),
  KEY `subnum_index` (`subnum`),
  KEY `timestamp_index` (`timestamp`),
  KEY `trip_index` (`trip`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

and feed it with data from here:

http://archive.easymodo.net/a_cut.txt.gz

The file is ~46MB, and ~230 MB when extracted.

Then execute queries:

select num,subnum from a where trip='!mOWjKAWAII' order by timestamp asc limit 10;
select num,subnum from a where trip='!mOWjKAWAII' order by timestamp desc limit 10;

Thank you for the problem report. Please, send the results of:

select count(*) from a;
select count(*) from a where trip='!mOWjKAWAII';
select max(timestamp) from a;
select max(timestamp) from a where trip='!mOWjKAWAII';

My current idea is that rows with trip='!mOWjKAWAII' have timestamp value far from maximum, and you have many rows with "big" timestamp values where trip is NOT '!mOWjKAWAII'. So, with the same plan, a lot more of rows are to be scanned by index on timestamp column before you find those "top 10".

mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
|  9546373 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from a where trip='!mOWjKAWAII';
+----------+
| count(*) |
+----------+
|     7535 |
+----------+
1 row in set (0.05 sec)

mysql> select max(timestamp) from a;
+----------------+
| max(timestamp) |
+----------------+
|     1236222119 |
+----------------+
1 row in set (0.00 sec)

mysql> select max(timestamp) from a where trip='!mOWjKAWAII';
+----------------+
| max(timestamp) |
+----------------+
|     1234378436 |
+----------------+
1 row in set (0.07 sec)

You were right, and query

select num,subnum from (select num,subnum,timestamp from a_cut where trip='!mOWjKAWAII') as x order by timestamp desc limit 10;

that does the same thing really works fast. I was assuming that mysql would choose what indexes to use first optimally. So this is expected behavior and I just should explicitly tell database what index to use by using subqueries?

For optimizer to be able to always choose proper index in cases like this it would need to know a distribution of data over 2 columns.

Even knowing that only 7535 rows has that value you compare to in WHERE is not enough to decide if index on ORDER BY column is bad or good for LIMIT 10. If that rows have largest timestamps accessing them in order, without sorting, can still be the best strategy.

So, I think the problem you had found leads us a to a nice feature request. But this is not a bug formally, as we have no means currently to give the needed information about data to the optimizer. You have to use subqueries or FORCE INDEX hints to pass to optimizer the facts you know about your data.

All right, thank you very much, USE INDEX seems to be a fine solution.