MySQL Bugs: #21174: Index degrades sort performance and optimizer does not honor IGNORE INDEX

Bug #21174	Index degrades sort performance and optimizer does not honor IGNORE INDEX
Submitted:	20 Jul 2006 6:17	Modified:	12 Apr 2007 21:24
Reporter:	John David Duncan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.0.18, 5.1bk	OS:	Linux (linux)
Assigned to:	Georgi Kodinov	CPU Architecture:	Any
Tags:	bfsm_2006_12_07, Q1

Description:
"SELECT col_a, sum(col_b) FROM large_table GROUP BY col_a"  can be much slower if there is an index on col_a than if there is not one. "SELECT col_a, sum(col_b) FROM large_table IGNORE INDEX (index_on_col_a) GROUP BY col_a" does not help because the IGNORE INDEX hint is disregarded. 

How to repeat:
create table t1 (   
  pk int primary key not null auto_increment, 
  category int,
  amount float, 
  misc int  
) engine = innodb;

create table t2 like t1;
alter table t2 add index (category) ; 

create table t3 like t2;
alter table t3 engine = myisam;

DELIMITER //
CREATE PROCEDURE newrows(n int)
BEGIN
 DECLARE j int;
 SET j = 1;
 REPEAT 
  INSERT INTO t1 values (NULL, floor(rand() * 200) , rand() * 100 , 0) ;
  SET j = j + 1 ;
UNTIL j = n
END REPEAT;
END 
//

CALL newrows(250000) ;
insert into t2 select * from t1;
insert into t3 select * from t1;

-- t1 has no problem
explain select category, sum(amount) from t1 group by category ; 

-- t2 has a problem!!  (Index scan instead of table scan )
explain select category, sum(amount) from t2 group by category ; 

-- Also a problem:
explain select category, sum(amount) from t2 IGNORE INDEX (category) group by category;

-- t3 has no problem, so maybe something about innodb triggers the problem with t2.
explain select category, sum(amount) from t3 group by category ;

Verified as described on Linux and both 5.0 and 5.1 BK.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/9948

ChangeSet@1.2227, 2006-08-02 17:18:00+03:00, gkodinov@macbook.gmz +3 -0
  Bug #21174: Index degrades sort performance and 
               optimizer does not honor IGNORE INDEX
   - Allow an index to be used for sorting the table 
     instead of filesort only if it is not disabled by
     IGNORE INDEX.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/10373

ChangeSet@1.2227, 2006-08-14 18:19:29+03:00, gkodinov@macbook.gmz +3 -0
  Bug #21174: Index degrades sort performance and 
               optimizer does not honor IGNORE INDEX
   - Allow an index to be used for sorting the table 
     instead of filesort only if it is not disabled by
     IGNORE INDEX.

Fixed in 5.0.25

Available in 5.0.25.

Fixed in 5.1.12

Noted in 5.0.25, 5.1.12 changelog.

InnoDB did not honor IGNORE INDEX, which prevented using IGNORE INDEX in cases where an index sort would be slower than a filesort.

the fix is wrong, old behaviour was intentional and explicitly documented:
"
`USE INDEX', `IGNORE INDEX', and `FORCE INDEX' affect only which indexes
are used when MySQL decides how to find rows in the table and how to do
the join. They do not affect whether an index is used when resolving an
`ORDER BY' or `GROUP BY'.
"

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12593

ChangeSet@1.2280, 2006-09-27 12:53:53+03:00, gkodinov@macbook.gmz +3 -0
  Bug #21174: Index degrades sort performance and optimizer does not honor IGNORE INDEX
   - reversed the patch for 5.0 and moved to 5.1

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/12597

ChangeSet@1.2338, 2006-09-27 13:11:00+03:00, gkodinov@macbook.gmz +3 -0
  Bug#21174: Index degrades sort performance and optimizer does not honor IGNORE INDEX
   - fix moved to 5.1

The patch was reverted based on the idea that you could use 
  SELECT SQL_BIG_RESULT ... 
in your query instead of  IGNORE INDEX, but get the same results.

This is documented at http://dev.mysql.com/doc/refman/5.0/en/select.html and unfortunately several of us overlooked that in the process of getting to this point.

Derek, can you confirm whether this solves the issue before we take the next step?

If you try SQL_BIG_RESULT in the test case I originally submitted with this bug, it does not make any difference.

mysql> explain select category, sum(amount) from t2 group by category \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: NULL
          key: category
      key_len: 5
          ref: NULL
         rows: 250558
        Extra: 
1 row in set (0.01 sec)

mysql> explain select SQL_BIG_RESULT category, sum(amount) from t2 group by category \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: index
possible_keys: NULL
          key: category
      key_len: 5
          ref: NULL
         rows: 250558
        Extra: 
1 row in set (0.00 sec)

Pushed in 5.0.26/5.1.12

This patch should be removed from 5.1 as it breaks compatiblity with older versions and the current behaviour is fine for a lot of current usage cases
(Especially when using ORDER BY with LIMIT, the current patch is VERY bad)

What needs to done instead in 5.1 or 52, is to complement the IGNORE INDEX syntax they way that Sergei describe earlier in the bug report:

IGNORE INDEX [FOR {JOIN|ORDER BY|GROUP BY}]. If no FOR ... is specified the index will be ignored everywhere (as in the changeset above, except if you start mysqld with --old, in which case if there is no FOR, the index will only be ignored for join.

As a way for the original bug reporter to work around the problem in MySQL:
Add a +0 to the first argument to ORDER BY; This will disable any index optimization for the ORDER BY part.

Implemented (and reviewed) in a separate WorkLog Item (WL#3527).

Implemented in WL#3527.
Pushed in 5.0.40, 5.1.17

Noted in 5.0.40 changelog.

The syntax for index hints has been extended to enable explicit
specification that the hint applies only to join processing.

Noted in 5.1.17 changelog.

The syntax for index hints has been extended to enable more
fine-grained control over the optimizer's selection of an execution
plan for various phases of query processing.

I also added a note to the 5.0.25 changelog entry that that
patch was reverted in 5.0.26.

The updated operation of index hints now is documented in its
own section of the manual:

http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

I'm using MySQL 5.0.45 community-nt-log on Windows XP platform with 256M ram.

I have 3M rows in a 1.2G sized table T, with a compound index meant to increase performance of queries like
Select <KEY fields>, sum(<field) FROM T Group By <Key fields>
without WHERE clause.

There are about 900 key field combinations, inserted in random order. So using the index means 3M random file I/O which is far more expensive than reading 1.2G data in sequential order + 3M lookups in the resultset (which is probably entirely in the memory). On my machine queries that use index for grouping are at least 10 times slower.