MySQL Bugs: #42282: Failure to use index for group-by

Bug #42282	Failure to use index for group-by
Submitted:	22 Jan 2009 22:23	Modified:	22 Feb 2009 22:49
Reporter:	Baron Schwartz (Basic Quality Contributor)	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.0.67	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The optimizer fails to use an index for fast groupwise min/max when it could.

How to repeat:
mysql> create table demo(a int, b int, key(a, b));
mysql> insert into demo values(1, 3), (1, 5);
mysql> explain select a, max(b) from demo where a = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: ref
possible_keys: a
          key: a
      key_len: 5
          ref: const
         rows: 1
        Extra: Using where; Using index
1 row in set (0.00 sec)

mysql> explain select a, max(b) from demo group by a\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: demo
         type: index
possible_keys: NULL
          key: a
      key_len: 10
          ref: NULL
         rows: 2
        Extra: Using index
1 row in set (0.00 sec)

In both cases I expect to see the index used for the max() with "Using index for group-by" in Extra as explained here:

http://dev.mysql.com/doc/refman/5.1/en/loose-index-scan.html

It is possible that my trivial test case has too few rows for the optimizer to consider this strategy.  However, I've just come from a call with a customer on a 37GB table where each distinct value of "a" has many many rows, and failure to use the index as it could causes the query never to complete.  On the other hand, rewriting the query as

select a, b from demo where a = 1 order by b desc limit 1

and UNION-ing all the desired values of "a" together if needed, gives basically instantaneous execution time.  In this case, the optimizer ought to be choosing to use the index for the group-by, as far as I can see, and choosing not to is disastrously slow.  A full table scan (forcing it to ignore indexes altogether) is terribly slow, of course, but at least it completes -- letting it use the indexes badly causes so much random I/O the query runs indefinitely until it's killed.

Thank you for the report.

If I modify test case provided so tabel demo has more rows I get next results:

=====bug42282=====
create table demo(a int, b int, key(a, b));
insert into demo values (1,1), (1,2), (1,3), (1,4), (1,5),(2, 2), (2,3), (2,1), (3,1);
explain select a, max(b) from demo where a = 1;
id      1
select_type     SIMPLE
table   demo
type    ref
possible_keys   a
key     a
key_len 5
ref     const
rows    4
Extra   Using where; Using index
explain select a, max(b) from demo group by a;
id      1
select_type     SIMPLE
table   demo
type    range
possible_keys   NULL
key     a
key_len 5
ref     NULL
rows    10
Extra   Using index for group-by

Is it what you want to see?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".