Bug #18493 "Using index for group-by" not used in subqueries
Submitted: 24 Mar 2006 17:01 Modified: 17 Aug 2007 23:41
Reporter: Björn Steinbrink Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S4 (Feature request)
Version:5.0.18 OS:Linux (Linux)
Assigned to: CPU Architecture:Any
Tags: performance, subquery

[24 Mar 2006 17:01] Björn Steinbrink
Description:
In a subquery, the optimizer seems to never choose "Using index for group-by"
but only "Using index".

How to repeat:
mysql> create table foo (a int, index a (a));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into foo values
(5),(10),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(15),(
20),(20),(25);
Query OK, 19 rows affected (0.00 sec)
Records: 19  Duplicates: 0  Warnings: 0

mysql> explain select distinct a from foo;
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  |
rows | Extra                    |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
|  1 | SIMPLE      | foo   | range | NULL          | a    | 5       | NULL |  
10 | Using index for group-by |
+----+-------------+-------+-------+---------------+------+---------+------+----
--+--------------------------+
1 row in set (0.00 sec)

mysql> explain select * from (select distinct a from foo) bar;
+----+-------------+------------+-------+---------------+------+---------+------
+------+-------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref 
| rows | Extra       |
+----+-------------+------------+-------+---------------+------+---------+------
+------+-------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL | NULL    | NULL
|    5 |             |
|  2 | DERIVED     | foo        | index | NULL          | a    | 5       | NULL
|   19 | Using index |
+----+-------------+------------+-------+---------------+------+---------+------
+------+-------------+
2 rows in set (0.00 sec)

If the table becomes large, the second explain will take quite some time (over
14 seconds here for a 1GB test table with some additional fields), while the first one doesn't slow down (the cardinality of the primary key is the same in the small and the big table, OPTIMIZE TABLE was used prior to testing).

Suggested fix:
The optimizer should choose "Using index for group-by" in subqueries as it does for normal queries.
[24 Mar 2006 17:30] Valeriy Kravchuk
Thank you for a problem report. Verified just as described on 5.0.20-BK. I think, though, that it is a valid feature request for the optimizer.
[29 Nov 2006 20:39] Tom Callahan
This would also appear to be true for the range type in subqueries, and quite possibly others.
[10 Feb 2007 14:13] Björn Steinbrink
This seems to be fixed.

mysql> SELECT VERSION();
+---------------------+
| VERSION()           |
+---------------------+
| 5.0.32-Debian_3-log | 
+---------------------+

mysql> explain select * from (select distinct a from foo) bar;
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL | NULL    | NULL |    5 |                          | 
|  2 | DERIVED     | foo        | range | NULL          | a    | 5       | NULL |   10 | Using index for group-by | 
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)

Range type subqueries also seem to work fine.

mysql> explain select * from (select a from foo WHERE a > 5) bar;
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table      | type  | possible_keys | key  | key_len | ref  | rows | Extra                    |
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL | NULL    | NULL |   18 |                          | 
|  2 | DERIVED     | foo        | range | a             | a    | 5       | NULL |   18 | Using where; Using index | 
+----+-------------+------------+-------+---------------+------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)
[17 Aug 2007 23:41] Igor Babaev
- This problem will be resolved when WL#3485 is implemented (not started yet).

By the above reason I move the bug to 'To be fixed later'.
Product management will decide in what version a fix for this problem appears.