Bug #43401 | Query spends too much time in ``sending data'' state | ||
---|---|---|---|
Submitted: | 5 Mar 2009 8:00 | Modified: | 5 Mar 2009 12:37 |
Reporter: | Andrey Osenenko | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S3 (Non-critical) |
Version: | 5.1.30 | OS: | FreeBSD |
Assigned to: | CPU Architecture: | Any |
[5 Mar 2009 8:00]
Andrey Osenenko
[5 Mar 2009 8:13]
Valeriy Kravchuk
Thank you for the problem report. Please, send the results of: select count(*) from a; select count(*) from a where trip='!mOWjKAWAII'; select max(timestamp) from a; select max(timestamp) from a where trip='!mOWjKAWAII'; My current idea is that rows with trip='!mOWjKAWAII' have timestamp value far from maximum, and you have many rows with "big" timestamp values where trip is NOT '!mOWjKAWAII'. So, with the same plan, a lot more of rows are to be scanned by index on timestamp column before you find those "top 10".
[5 Mar 2009 8:28]
Andrey Osenenko
mysql> select count(*) from a; +----------+ | count(*) | +----------+ | 9546373 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from a where trip='!mOWjKAWAII'; +----------+ | count(*) | +----------+ | 7535 | +----------+ 1 row in set (0.05 sec) mysql> select max(timestamp) from a; +----------------+ | max(timestamp) | +----------------+ | 1236222119 | +----------------+ 1 row in set (0.00 sec) mysql> select max(timestamp) from a where trip='!mOWjKAWAII'; +----------------+ | max(timestamp) | +----------------+ | 1234378436 | +----------------+ 1 row in set (0.07 sec) You were right, and query select num,subnum from (select num,subnum,timestamp from a_cut where trip='!mOWjKAWAII') as x order by timestamp desc limit 10; that does the same thing really works fast. I was assuming that mysql would choose what indexes to use first optimally. So this is expected behavior and I just should explicitly tell database what index to use by using subqueries?
[5 Mar 2009 11:57]
Valeriy Kravchuk
For optimizer to be able to always choose proper index in cases like this it would need to know a distribution of data over 2 columns. Even knowing that only 7535 rows has that value you compare to in WHERE is not enough to decide if index on ORDER BY column is bad or good for LIMIT 10. If that rows have largest timestamps accessing them in order, without sorting, can still be the best strategy. So, I think the problem you had found leads us a to a nice feature request. But this is not a bug formally, as we have no means currently to give the needed information about data to the optimizer. You have to use subqueries or FORCE INDEX hints to pass to optimizer the facts you know about your data.
[5 Mar 2009 12:37]
Andrey Osenenko
All right, thank you very much, USE INDEX seems to be a fine solution.