MySQL Bugs: #50843: Optimizer ignores clustered index for GROUP BY even when it improves performance

Bug #50843	Optimizer ignores clustered index for GROUP BY even when it improves performance
Submitted:	2 Feb 2010 18:12	Modified:	20 Jun 2010 22:33
Reporter:	Chris Calender	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.1.42	OS:	Any
Assigned to:	Evgeny Potemkin	CPU Architecture:	Any
Tags:	clustered index, GROUP BY, Optimizer, regression

Description:
The optimizer in MySQL 5.1 ignores the clustered index (of a PRIMARY KEY) for GROUP BY even when it would improve performance of the query.

For an example, execute the following query in 5.1 and 5.0 to compare (note I am using the `world` database, with the tables converted to InnoDB):

EXPLAIN SELECT country.* 
FROM country 
LEFT JOIN countrylanguage 
ON country.code=countrylanguage.countrycode 
GROUP BY country.code;

In 5.0, you will see that the PRIMARY KEY for table `country` is used to resolve this.  However, it is not used in 5.1, which leads to "Using temporary; Using filesort" in order to resolve the query.

How to repeat:
1. Load the `world` database.
2. ALTER `country` and `countrylanguage` to be InnoDB tables

ALTER TABLE `country` ENGINE=InnoDB;
ALTER TABLE `countrylanguage` ENGINE=InnoDB;

3. Run the following query:

EXPLAIN SELECT country.* 
FROM country 
LEFT JOIN countrylanguage 
ON country.code=countrylanguage.countrycode 
GROUP BY country.code;

Output from 5.1.42:

mysql> EXPLAIN SELECT country.* FROM country LEFT JOIN countrylanguage ON country.code=countrylanguage.countrycode GROUP BY country.code;
+----+-------------+-----------------+------+---------------+---------+---------+--------------------+------+---------------------------------+
| id | select_type | table           | type | possible_keys | key     | key_len | ref                | rows | Extra                           |
+----+-------------+-----------------+------+---------------+---------+---------+--------------------+------+---------------------------------+
|  1 | SIMPLE      | country         | ALL  | NULL          | NULL    | NULL    | NULL               |  218 | Using temporary; Using filesort |
|  1 | SIMPLE      | countrylanguage | ref  | PRIMARY       | PRIMARY | 3       | world.country.Code |    2 | Using index                     |
+----+-------------+-----------------+------+---------------+---------+---------+--------------------+------+---------------------------------+
2 rows in set (0.00 sec)

Output from 5.0.89:

mysql> EXPLAIN SELECT country.* FROM country LEFT JOIN countrylanguage ON country.code=countrylanguage.countrycode GROUP BY country.code;
+----+-------------+-----------------+-------+---------------+---------+---------+--------------------+------+-------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref                | rows | Extra       |
+----+-------------+-----------------+-------+---------------+---------+---------+--------------------+------+-------------+
|  1 | SIMPLE      | country         | index | NULL          | PRIMARY | 3       | NULL               |  237 |             |
|  1 | SIMPLE      | countrylanguage | ref   | PRIMARY       | PRIMARY | 3       | world.country.Code |    2 | Using index |
+----+-------------+-----------------+-------+---------------+---------+---------+--------------------+------+-------------+
2 rows in set (0.02 sec)

Suggested fix:
Allow the optimizer to use the clustered index when the GROUP BY is on the Primary Key (i.e., the clustered index).

This also fails in 5.1.35.

This bug is not a regression of the fix for the bug #45828. The reasons are explained in the comments of the bug #46011.

This also fails in 5.1.36.

A diagnosis of the bug is posted in the bug #46011.

Bug #46011 contains info on my final testing of the patch.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101247

3354 Evgeny Potemkin	2010-02-23
      Bug#50843: Filesort used instead of clustered index led to
      performance degradation.
      
      Filesort + join cache combination is preferred to full index scan because it
      is usually faster. But it's not the case when the index is clustered one.
      
      Now test_if_skip_sort_order function prefers filesort only if index isn't
      clustered.
     @ mysql-test/r/innodb_mysql.result
        Added a test case for the bug#50843.
     @ mysql-test/t/innodb_mysql.test
        Added a test case for the bug#50843.
     @ sql/sql_select.cc
        Bug#50843: Filesort used instead of clustered index led to
        performance degradation.
        Now test_if_skip_sort_order function prefers filesort only if index isn't
        clustered.

I would prefer to apply DeMorgan to the last part of the predicate,
I find this slightly more readable:

+    if ((select_limit >= table_records) &&
+        (tab->type == JT_ALL &&
+         tab->join->tables > tab->join->const_tables + 1) &&
+        ((unsigned) best_key != table->s->primary_key ||
+         !table->file->primary_key_is_clustered()))

I fully concur with Tor ... improves readability ...

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101581

3354 Evgeny Potemkin	2010-02-26
      Bug#50843: Filesort used instead of clustered index led to
      performance degradation.
      
      Filesort + join cache combination is preferred to full index scan because it
      is usually faster. But it's not the case when the index is clustered one.
      
      Now test_if_skip_sort_order function prefers filesort only if index isn't
      clustered.
     @ mysql-test/r/innodb_mysql.result
        Added a test case for the bug#50843.
     @ mysql-test/t/innodb_mysql.test
        Added a test case for the bug#50843.
     @ sql/sql_select.cc
        Bug#50843: Filesort used instead of clustered index led to
        performance degradation.
        Now test_if_skip_sort_order function prefers filesort only if index isn't
        clustered.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101585

3358 Evgeny Potemkin	2010-02-26 [merge]
      Auto-merged fox for the bug#50843.

Pushed into 5.1.45 (revid:joro@sun.com-20100301083827-xnimmrjg6bh33o1o) (version source revid:sergey.glukhov@sun.com-20100226121905-cadcd8ayvcka9c6y) (merge vers: 5.1.45) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100302142746-u1gxdf5yk2bjrq3e) (version source revid:alik@sun.com-20100301095421-4cz64ibem1h2quve) (merge vers: 6.0.14-alpha) (pib:16)

Pushed into 5.5.3-m2 (revid:alik@sun.com-20100302072233-t3uqgjzdukt1pyhe) (version source revid:alexey.kopytov@sun.com-20100226130631-8czhisohzf6jyo2x) (merge vers: 5.5.3-m2) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100302072432-k8xvfkgcggkwgi94) (version source revid:alik@sun.com-20100301093944-a4rvrmqqco6c0qao) (pib:16)

Noted in 5.1.45, 5.5.3, 6.0.14 changelogs.

Use of filesort plus the join cache normally is preferred to a full
index scan. But it was used even if the index is clustered, in which
case, the clustered index scan can be faster.

Also see bug #52361.

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:martin.skold@mysql.com-20100609140708-52rvuyq4q500sxkq) (merge vers: 5.1.45-ndb-6.2.19) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)