Bug #45373 INDEX FOR GROUP BY is not used if there is a LIMIT clause in the query
Submitted: 8 Jun 2009 13:14 Modified: 19 Jun 2009 4:19
Reporter: Alex Bolenok Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.1.35-community, 5.1.36-bzr OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: cost model, GROUP BY, INDEX, limit

[8 Jun 2009 13:14] Alex Bolenok
Description:
A LIMIT clause appended to a GROUP BY query prevents INDEX FOR GROUP BY from being used.

How to repeat:
CREATE TABLE filler (
        id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=Memory;

DELIMITER $$

CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
        DECLARE _cnt INT;
        SET _cnt = 1;
        WHILE _cnt <= cnt DO
                INSERT
                INTO    filler
                SELECT  _cnt;
                SET _cnt = _cnt + 1;
        END WHILE;
END
$$

DELIMITER ;
CALL prc_filler(10000);
CREATE TABLE t_latest (
        id INT NOT NULL PRIMARY KEY,
        ip INT NOT NULL,
        ts DATETIME NOT NULL,
        value VARCHAR(50) NOT NULL,
        KEY ix_latest_ip_ts (ip, ts)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT
INTO    t_latest (id, ip, ts, value)
SELECT  id,
        INET_ATON('10.0.0.0') + FLOOR(RAND(20090608) * 0x10),
        CAST('2009-06-08' AS DATETIME) - INTERVAL id SECOND,
        CONCAT('Record ', id)
FROM    filler;

ANALYZE TABLE t_latest;

EXPLAIN
SELECT  INET_NTOA(ip) AS textip, MAX(ts) AS maxts
FROM    t_latest FORCE INDEX FOR GROUP BY (ix_latest_ip_ts)
GROUP BY
        ip
ORDER BY
        maxts DESC;

1, 'SIMPLE', 't_latest', 'range', '', 'ix_latest_ip_ts', '4', '', 23, 'Using index for group-by; Using temporary; Using filesort'

EXPLAIN
SELECT  INET_NTOA(ip) AS textip, MAX(ts) AS maxts
FROM    t_latest FORCE INDEX FOR GROUP BY (ix_latest_ip_ts)
GROUP BY
        ip
ORDER BY
        maxts DESC
LIMIT 10;

1, 'SIMPLE', 't_latest', 'index', '', 'ix_latest_ip_ts', '12', '', 10465, 'Using index; Using temporary; Using filesort'

I expected the INDEX FOR GROUP BY to be used by the second query too. However, the index is not used, despite being forced with FORCE INDEX FOR GROUP BY.

This contradicts documentation which says:

> You can also use FORCE INDEX, which acts like USE INDEX (index_list) but with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the given indexes to find rows in the table. 

There definitely is a way to use the index to find the rows, since the second query is just same as the first query but with LIMIT 10 appended.

Note that this query, which is just a synonym for the second query, does use the index to retrieve all values:

EXPLAIN
SELECT  *
FROM    (
        SELECT  INET_NTOA(ip) AS textip, MAX(ts) AS maxts
        FROM    t_latest FORCE INDEX FOR GROUP BY (ix_latest_ip_ts)
        GROUP BY
        ip
        ) q
ORDER BY
        maxts DESC
LIMIT 10;

1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 16, 'Using filesort'
2, 'DERIVED', 't_latest', 'range', '', 'ix_latest_ip_ts', '4', '', 23, 'Using index for group-by'

Suggested fix:
Fix the optimizer algorithm so that INDEX FOR GROUP BY would be usable and forceable by a GROUP BY query with a LIMIT clause.
[8 Jun 2009 17:37] Valeriy Kravchuk
Thank you for the problem report. Verified just as described with recent 5.1.36 from bzr. 

Index is not used even with LIMIT 1, and this leads to slower execution. Your workaround (where index is used for GROUP BY) allows to get results faster.
[8 Jun 2009 20:03] Alex Bolenok
On my installation of '6.0.11-alpha-community' the queries behave just the same.
[9 Jun 2009 17:35] MySQL Verification Team
Test case has to be changed thoroughly.

* There must not be any FORCE INDEX used

* There must not be any nested or derived query

* There must be ORDER BY NULL, so that index for sorting is not chosen

Hence, there must be a pure case. One in which index is used for GROUP BY, without limit, and the other when it is NOT used for GROUP BY, when LIMIT is present.

I hope that this is clear enough ......
[9 Jun 2009 18:03] Valeriy Kravchuk
The same results with ORDER BY NULL and no FORCE INDEX:

mysql> EXPLAIN SELECT  INET_NTOA(ip) AS textip, MAX(ts) AS maxts FROM    t_latest GROUP BY         ip ORDER BY NULL;
+----+-------------+----------+-------+---------------+-----------------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys | key             | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+---------------+-----------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | t_latest | range | NULL          | ix_latest_ip_ts | 4       | NULL |   23 | Using index for group-by | 
+----+-------------+----------+-------+---------------+-----------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT  INET_NTOA(ip) AS textip, MAX(ts) AS maxts FROM    t_latest GROUP BY         ip ORDER BY NULL LIMIT 1;
+----+-------------+----------+-------+---------------+-----------------+---------+------+-------+-------------+
| id | select_type | table    | type  | possible_keys | key             | key_len | ref  | rows  | Extra       |
+----+-------------+----------+-------+---------------+-----------------+---------+------+-------+-------------+
|  1 | SIMPLE      | t_latest | index | NULL          | ix_latest_ip_ts | 12      | NULL | 10465 | Using index | 
+----+-------------+----------+-------+---------------+-----------------+---------+------+-------+-------------+
1 row in set (0.00 sec)
[4 Mar 2010 9:37] Manyi Lu
See also BUG#49111.
[16 Mar 2010 9:42] Sveta Smirnova
Bug #52081 was marked as duplicate of this one.