Bug #42094 Optimizer chooses a wrong index when group by, order by and limit is present
Submitted: 13 Jan 2009 23:01 Modified: 23 Mar 2009 6:38
Reporter: Alexander Rubin Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: GROUP BY, INDEX, limit, Optimizer, order by, regression
Triage: Triaged: D3 (Medium) / R4 (High) / E4 (High)

[13 Jan 2009 23:01] Alexander Rubin
Description:
the following query:
select groupid, count(wordid) as cnt from t1 where wordid=4703 group by groupid order by cnt desc limit 10

uses an index on groupid and performs a full table scan + filesort, instead of using index on wordid (primary)

(here wordid=4703 is a very frequent ID, which returns 40% of rows)

mysql> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `type` smallint(5) unsigned NOT NULL,
  `groupid` int(10) unsigned NOT NULL,
  `wordid` int(10) unsigned NOT NULL,
  `source` smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`wordid`,`source`,`groupid`,`type`),
  KEY `groupid` (`groupid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> explain select groupid, count(wordid) as cnt from t1 where wordid=4703 group by groupid order by cnt desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: PRIMARY
          key: groupid
      key_len: 4
          ref: NULL
         rows: 50
        Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

Here MySQL chooses an index on groupid (incorrect). However, we have "order by cnt" clause in the query - in this situation mysql will have to perform a full table scan and filesort (see Extra field).

mysql> explain select groupid, count(wordid) as cnt from t1 where wordid=4703 group by groupid order by cnt desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 558642
        Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

If we do not have a LIMIT mysql chooses index on PRIMARY (correct)

mysql> explain select groupid, count(wordid) as cnt from t1 where wordid=4703 group by groupid  limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: PRIMARY
          key: groupid
      key_len: 4
          ref: NULL
         rows: 50
        Extra: Using where; Using index
1 row in set (0.00 sec)

if we have no "order by", mysql will use groupid with no filesort and temporary table (correct).

Speed comparison:

mysql>  select groupid, count(wordid) as cnt from t1 where wordid=4703 group by groupid order by cnt desc limit 10;
+---------+-----+
| groupid | cnt |
+---------+-----+
|     350 |  62 |
|     530 |  62 |
|     522 |  60 |
|     361 |  59 |
|     791 |  57 |
|     676 |  57 |
|     242 |  56 |
|     508 |  55 |
|     446 |  55 |
|     289 |  55 |
+---------+-----+
10 rows in set (0.40 sec)

mysql>  select groupid, count(wordid) as cnt from t1 use index (PRIMARY) where wordid=4703 group by groupid order by cnt desc limit 10;
+---------+-----+
| groupid | cnt |
+---------+-----+
|     530 |  62 |
|     350 |  62 |
|     522 |  60 |
|     361 |  59 |
|     676 |  57 |
|     791 |  57 |
|     242 |  56 |
|     446 |  55 |
|     289 |  55 |
|     677 |  55 |
+---------+-----+
10 rows in set (0.28 sec)

How to repeat:
Table data attached
Please see explains and sql queries above.

Suggested fix:
Fix the optimizer, so it will choose the correct index when doing "order by <func>"
[13 Jan 2009 23:12] Alexander Rubin
Uploaded ftp.mysql.com/pub/mysql/upload/t1.sql.gz
[14 Jan 2009 5:16] Valeriy Kravchuk
Thank you for a problem report. 

Why do you think this is a regression? What exact previous version that worked properly you had identified?
[14 Jan 2009 14:51] Alexander Rubin
Valeriy, here is the same query in MySQL 5.0.67:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.0.67 MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> explain select groupid, count(wordid) as cnt from t1 where wordid=4703 group by
    -> groupid order by cnt desc limit 10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 301926
        Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

So it chooses the right index in MySQL 5.0 and wrong in MySQL 5.1 for the same query.
[3 Jun 2009 22:40] Josh Orr
I just wanted to mention that I ma having this same exact problem.  In my case, I order by the Primary key and in mysql 5.0 it uses the right index to restrict the result set before ordering the result.  But in 5.1, it now does a full table scan (due to it using the primary key as the index to use) and filtering the result set.  I think it does this so it can read the rows in the order that it needs to sort it by, but the result is it going though 7 million rows instead of 45 rows.  As you can imagine, it is order's of magnitude slower. 

In the examples below, packageID is the primary key, and batchID has an index on it.  On mysql 5.0, the execution plain are the same for both (they both use the batcHID index).

Some basic examples executed with mysql 5.1:

This one takes 13.54  seconds to execute:
EXPLAIN SELECT packageID, Zip FROM packages WHERE batchID = 0 ORDER BY packageID;
                  id: 1
  select_type: SIMPLE
             table: packages
              type: index
possible_keys: batchID
               key: PRIMARY
        key_len: 4
                ref: NULL
            rows: 7763865
            Extra: Using where

This one takes .04 seconds to execute:

EXPLAIN SELECT packageID, Zip FROM packages WHERE batchID = 0 ORDER BY packageID;
                  id: 1
  select_type: SIMPLE
             table: packages
              type: ref
possible_keys: batchID
               key: batchID
        key_len: 8
                ref: const
            rows: 14046
            Extra:
[4 Jun 2009 23:25] Josh Orr
Sorry, I pasted in the wrong query.  The second query should read like this:

EXPLAIN SELECT packageID, Zip FROM packages WHERE batchID = 0;

It's the same query, except without the order by.
[12 Jul 2010 11:01] Tomer Ratz
I ran across the same issue, on support Issue #49091.
The wrong index is chosen, and a full table scan is being performed.
[9 Sep 2010 12:38] Andrii Nikitin
bug #50394 may explain reasons for this bug, or at least related
[8 Nov 2011 13:12] Jochen Riehm
We are frequently experiencing this problem as well. Is there any more global workaround besides adding USE INDEX to every problematic query?