Bug #54225 Optimizer prefers table scan and filesort instead of index scan
Submitted: 4 Jun 2010 8:12 Modified: 13 Jun 2010 12:31
Reporter: Andrew A Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:5.1.37, 5.1.49, 5.5.5-m3 OS:Linux (5.1.37-1ubuntu5.1)
Assigned to: CPU Architecture:Any
Tags: optimizer index filesort
Triage: Triaged: D3 (Medium)

[4 Jun 2010 8:12] Andrew A
Description:
(I am not sure if this is a bug; I considered it a bit strange the optimizer did not make this decision)

An SQL query attempts to find rows matching a WHERE clause and also supplies an ORDER BY and LIMIT.

No indexes are available to meet the WHERE clause, but there is an index available on the ORDER BY column.

EXPLAIN indicates the optimizer is choosing to:
1) Perform a table scan
2) Perform a filesort

In order to resolve this query.

I was wondering whether the optimizer would consider performing an index scan on the ORDER BY index to retrieve possible rows in order, instead of resorting to a filesort later?

---

Example:

Given the following table and index definition from a 3rd party forum software:

--
CREATE TABLE `phpbb_topics` (
 `topic_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
 `forum_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `icon_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_attachment` tinyint(1) unsigned NOT NULL DEFAULT '0',
 `topic_approved` tinyint(1) unsigned NOT NULL DEFAULT '1',
 `topic_reported` tinyint(1) unsigned NOT NULL DEFAULT '0',
 `topic_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
 `topic_poster` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_time` int(11) unsigned NOT NULL DEFAULT '0',
 `topic_time_limit` int(11) unsigned NOT NULL DEFAULT '0',
 `topic_views` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_replies` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_replies_real` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_status` tinyint(3) NOT NULL DEFAULT '0',
 `topic_type` tinyint(3) NOT NULL DEFAULT '0',
 `topic_first_post_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_first_poster_name` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
 `topic_first_poster_colour` varchar(6) COLLATE utf8_bin NOT NULL DEFAULT '',
 `topic_last_post_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_last_poster_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_last_poster_name` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
 `topic_last_poster_colour` varchar(6) COLLATE utf8_bin NOT NULL DEFAULT '',
 `topic_last_post_subject` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
 `topic_last_post_time` int(11) unsigned NOT NULL DEFAULT '0',
 `topic_last_view_time` int(11) unsigned NOT NULL DEFAULT '0',
 `topic_moved_id` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `topic_bumped` tinyint(1) unsigned NOT NULL DEFAULT '0',
 `topic_bumper` mediumint(8) unsigned NOT NULL DEFAULT '0',
 `poll_title` varchar(255) COLLATE utf8_bin NOT NULL DEFAULT '',
 `poll_start` int(11) unsigned NOT NULL DEFAULT '0',
 `poll_length` int(11) unsigned NOT NULL DEFAULT '0',
 `poll_max_options` tinyint(4) NOT NULL DEFAULT '1',
 `poll_last_vote` int(11) unsigned NOT NULL DEFAULT '0',
 `poll_vote_change` tinyint(1) unsigned NOT NULL DEFAULT '0',
 PRIMARY KEY (`topic_id`),
 KEY `forum_id` (`forum_id`),
 KEY `forum_id_type` (`forum_id`,`topic_type`),
 KEY `last_post_time` (`topic_last_post_time`),
 KEY `topic_approved` (`topic_approved`),
 KEY `forum_appr_last` (`forum_id`,`topic_approved`,`topic_last_post_id`),
 KEY `fid_time_moved` (`forum_id`,`topic_last_post_time`,`topic_moved_id`)
) ENGINE=InnoDB AUTO_INCREMENT=223323 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
--

And the following query:
SELECT DISTINCT t.topic_last_post_time, p.topic_id
FROM forums.phpbb_posts p, forums.phpbb_topics t
WHERE t.topic_replies =0 AND t.topic_moved_id =0 AND p.topic_id = t.topic_id AND p.forum_id NOT IN ( 72, 73 )
ORDER BY t.topic_last_post_time DESC
LIMIT 1001;

mysql> explain SELECT DISTINCT t.topic_last_post_time, p.topic_id FROM forums.phpbb_posts p, forums.phpbb_topics t WHERE t.topic_replies =0 AND t.topic_moved_id =0 AND p.topic_id = t.topic_id AND p.forum_id NOT IN ( 72, 73 ) ORDER BY t.topic_last_post_time DESC LIMIT 1001;
+----+-------------+-------+------+---------------------------------+---------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys                   | key           | key_len | ref               | rows | Extra                                        |
+----+-------------+-------+------+---------------------------------+---------------+---------+-------------------+------+----------------------------------------------+
|  1 | SIMPLE      | t     | ALL  | PRIMARY                         | NULL          | NULL    | NULL              |    2 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | ref  | forum_id,topic_id,tid_post_time | tid_post_time | 3       | forums.t.topic_id |  123 | Using where                                  |
+----+-------------+-------+------+---------------------------------+---------------+---------+-------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

--

You can see a table scan and filesort is taking place.

I would have thought the optimizer would consider using (as a last resort due to no WHERE coverage indexes) that it would perform a reverse scan on the t.topic_last_post_time column (which is indexed), and then perform WHERE comparison.

This way it would have avoided a filesort to order the results; and as soon as 1001 results were available it could stop scanning.

The 'posts' (p) table in question has ~2,660,500 rows with topics table at ~150,000.

I was able to reproduce the query plan with tables with only a handful of rows in each (which would be fine for a table scan in that case however).

How to repeat:
Use the attached schema files and perform an EXPLAIN on the following query:

SELECT DISTINCT t.topic_last_post_time, p.topic_id
FROM forums.phpbb_posts p, forums.phpbb_topics t
WHERE t.topic_replies =0 AND t.topic_moved_id =0 AND p.topic_id = t.topic_id AND p.forum_id NOT IN ( 72, 73 )
ORDER BY t.topic_last_post_time DESC
LIMIT 1001;

Suggested fix:
It may be reasonable for the optimizer to perform an index scan on the ORDER BY clause to avoid 'using filesort', when there are no suitable indexes available for the WHERE clause.

This would be a last resort optimization (when really new indexes would help).
[6 Jun 2010 18:42] Valeriy Kravchuk
It can be related to bug #38397, bug #46011 or bug #42094.
[10 Jun 2010 6:53] Valeriy Kravchuk
Verified just as described. Even simpler query does NOT use index for ORDER BY:

mysql> explain SELECT t.topic_last_post_time FROM phpbb_posts p, phpbb_topics t WHERE t.topic_replies =0 AND t.topic_moved_id =0 AND p.topic_id = t.topic_id AND p.forum_id NOT IN ( 72, 73 ) ORDER BY t.topic_last_post_time DESC LIMIT 1001;
+----+-------------+-------+------+---------------------------------+----------+---------+-----------------+------+-----------------------------+
| id | select_type | table | type | possible_keys                   | key      | key_len | ref             | rows | Extra                       |
+----+-------------+-------+------+---------------------------------+----------+---------+-----------------+------+-----------------------------+
|  1 | SIMPLE      | t     | ALL  | PRIMARY                         | NULL     | NULL    | NULL            |    2 | Using where; Using filesort |
|  1 | SIMPLE      | p     | ref  | forum_id,topic_id,tid_post_time | topic_id | 3       | test.t.topic_id | 6144 | Using where                 |
+----+-------------+-------+------+---------------------------------+----------+---------+-----------------+------+-----------------------------+
2 rows in set (0.00 sec)

neither with InnoDB nor with MyISAM storage engine. Index is used when forced though:

mysql> explain SELECT t.topic_last_post_time FROM phpbb_posts p, phpbb_topics t force index(`last_post_time`) WHERE t.topic_replies =0 AND t.topic_moved_id =0 AND p.topic_id = t.topic_id AND p.forum_id NOT IN ( 72, 73 ) ORDER BY t.topic_last_post_time DESC LIMIT 1001;
+----+-------------+-------+-------+---------------------------------+----------------+---------+-----------------+------+-------------+
| id | select_type | table | type  | possible_keys                   | key            | key_len | ref             | rows | Extra       |
+----+-------------+-------+-------+---------------------------------+----------------+---------+-----------------+------+-------------+
|  1 | SIMPLE      | t     | index | NULL                            | last_post_time | 4       | NULL            |    1 | Using where |
|  1 | SIMPLE      | p     | ref   | forum_id,topic_id,tid_post_time | topic_id       | 3       | test.t.topic_id | 6144 | Using where |
+----+-------------+-------+-------+---------------------------------+----------------+---------+-----------------+------+-------------+
2 rows in set (0.00 sec)
[13 Jun 2010 12:31] Andrew A
Hi,

Thank's for taking the time to check this out.

RE: 'Index is used when forced though' I noticed that USE INDEX and FORCE INDEX in my original query does NOT take effect (another possible bug?).

Thanks
[25 Mar 2011 20:02] Rick James
"rows" in the first EXPLAIN is 2, which severely disagrees with
"topics table at ~150,000"

I suggest that if "2" is correct, the optimizer did the right thing by simply doing a table scan.  So, is this an ANALYZE problem?  Or is the table really smaller than stated?