MySQL Bugs: #29209: Loose index scan is not used if key parts are in different order in the GROUP BY

Bug #29209	Loose index scan is not used if key parts are in different order in the GROUP BY
Submitted:	19 Jun 2007 12:54	Modified:	1 Aug 2007 12:21
Reporter:	Sergey Petrunya	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	5.0.44/5.1BK	OS:	Any
Assigned to:	Martin Hansson	CPU Architecture:	Any
Tags:	GROUP BY, loose index scan

Description:
Loose index scan optimization is not used if GROUP BY clause lists key parts in the "wrong" order.

That is, if there is a key over (kp1,kp2,kp3) and queries 

Q1: explain select min(t1.kp3) from t1 group by kp2, kp1;
Q2: explain select min(t1.kp3) from t1 group by kp1, kp2;

Then Q2 will use loose index scan and Q1 won't.

According to Timour, loose index scan optimization should handle both cases.

How to repeat:
create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (kp1 int not null, 
                 kp2 int not null,
                 kp3 int not null,
                 filler char(100),
                 key(kp1, kp2, kp3)
                 );
insert into t1 select A.a, B.a, C.a ,'filler' from t0 A, t0 B, t0 C;

mysql> explain select min(t1.kp3) from t1 group by kp2, kp1 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: index
possible_keys: NULL
          key: kp1
      key_len: 12
          ref: NULL
         rows: 1000
        Extra: Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

mysql> explain select min(t1.kp3) from t1 group by kp1, kp2 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: range
possible_keys: NULL
          key: kp1
      key_len: 8
          ref: NULL
         rows: 101
        Extra: Using index for group-by
1 row in set (0.00 sec)

Suggested fix:
Make loose index scan optimizer detect and handle cases like Q2.

Thank you for the bug report. Verified as described.

After thinking more about this problem, I doubt that it can be fixed except for GROUP BY ... ORDER BY NULL, given that GROUP BY is expected to sort according to the fields listed. The result from the two test queries is sorted differently.

After discussion with Sergey Petrunia it was concluded that this is not a bug.

Even if the result is not to be sorted according to the GROUP BY, it probably
still makes sense to exploit loose index scan, if you use a temporary table to
do the sorting. If you disregard the order of the result tuples, the results
of any permutation of the GROUP BY fields are equvalent.

Assume there's a table t with key k over <kp_1, ..., kp_i, ...> and a prefix
p_k = <kp_1, ..., kp_[i-1]>. You can then rewrite any query on the form

SELECT MAX( kp_[i-1] ) 
FROM t 
WHERE <conds> 
GROUP BY <permutation of p_k>

to:

SELECT[ MAX( kp_[i-1] ) |  MIN( kp_[i-1] ) ]
FROM t 
WHERE <conds> 
GROUP BY <p_k> 
ORDER BY <permutation of p_k>

As long as the result of the rewrite (along with <conds>) is accepted for
loose index scan, the original query will be accepted for this treatment.

You can also rewrite

SELECT[ MAX( kp_[i-1] ) |  MIN( kp_[i-1] ) ]
FROM t 
WHERE <conds> 
GROUP BY <permutation of p_k>
ORDER BY <order>

to

SELECT[ MAX( kp_[i-1] ) |  MIN( kp_[i-1] ) ]
FROM t 
WHERE <conds> 
GROUP BY <p_k>
ORDER BY <order>

Furthermore, a query like 

SELECT[ MAX( kp_[i-1] ) |  MIN( kp_[i-1] ) ]
FROM t 
GROUP BY <permutation of p_k> 
ORDER BY NULL

can always be rewritten to

SELECT[ MAX( kp_[i-1] ) |  MIN( kp_[i-1] ) ]
FROM t GROUP BY <p_k>