Bug #42282 Failure to use index for group-by
Submitted: 22 Jan 2009 22:23 Modified: 22 Feb 2009 22:49
Reporter: Baron Schwartz (Basic Quality Contributor) Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.67 OS:Any
Assigned to: CPU Architecture:Any

[22 Jan 2009 22:23] Baron Schwartz
Description:
The optimizer fails to use an index for fast groupwise min/max when it could.

How to repeat:
mysql> create table demo(a int, b int, key(a, b));
mysql> insert into demo values(1, 3), (1, 5);
mysql> explain select a, max(b) from demo where a = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: a
          key: a
      key_len: 5
          ref: const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> explain select a, max(b) from demo group by a\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: index
possible_keys: NULL
          key: a
      key_len: 10
          ref: NULL
         rows: 2
        Extra: Using index
1 row in set (0.00 sec)

In both cases I expect to see the index used for the max() with "Using index for group-by" in Extra as explained here:

http://dev.mysql.com/doc/refman/5.1/en/loose-index-scan.html

It is possible that my trivial test case has too few rows for the optimizer to consider this strategy.  However, I've just come from a call with a customer on a 37GB table where each distinct value of "a" has many many rows, and failure to use the index as it could causes the query never to complete.  On the other hand, rewriting the query as

select a, b from demo where a = 1 order by b desc limit 1

and UNION-ing all the desired values of "a" together if needed, gives basically instantaneous execution time.  In this case, the optimizer ought to be choosing to use the index for the group-by, as far as I can see, and choosing not to is disastrously slow.  A full table scan (forcing it to ignore indexes altogether) is terribly slow, of course, but at least it completes -- letting it use the indexes badly causes so much random I/O the query runs indefinitely until it's killed.
[22 Jan 2009 22:49] Sveta Smirnova
Thank you for the report.

If I modify test case provided so tabel demo has more rows I get next results:

=====bug42282=====
create table demo(a int, b int, key(a, b));
insert into demo values (1,1), (1,2), (1,3), (1,4), (1,5),(2, 2), (2,3), (2,1), (3,1);
explain select a, max(b) from demo where a = 1;
id      1
select_type     SIMPLE
table   demo
type    ref
possible_keys   a
key     a
key_len 5
ref     const
rows    4
Extra   Using where; Using index
explain select a, max(b) from demo group by a;
id      1
select_type     SIMPLE
table   demo
type    range
possible_keys   NULL
key     a
key_len 5
ref     NULL
rows    10
Extra   Using index for group-by

Is it what you want to see?
[23 Feb 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".