MySQL Bugs: #50736: GROUP BY does not use index on ndbcluster tables

Bug #50736	GROUP BY does not use index on ndbcluster tables
Submitted:	29 Jan 2010 15:55	Modified:	27 Apr 2010 14:26
Reporter:	Casey Brown	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.0	OS:	Linux
Assigned to:	Jonas Oreland	CPU Architecture:	Any
Tags:	5.1.32-ndb-7.0.5-cluster-gpl, 7.0.9, force index, GROUP BY, MySQL Cluster, ndb

Description:
It appears that ndb tables will not use an index for GROUP BY operations, even if FORCE INDEX is used.

This results in temporary tables and filesort for even simple aggregate queries.

How to repeat:
CREATE TABLE `tester` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT ,
`fk` INT( 11 ) NOT NULL ,
`data` VARCHAR( 11 ) NOT NULL ,
PRIMARY KEY ( `id` ) ,
INDEX ( `fk` )
) ENGINE = NDBCLUSTER;
 
INSERT INTO `tester` (`id` ,`fk` ,`data`)
VALUES (NULL , '1', 'test'), (NULL , '2', 'test'), (NULL , '1', 'test'), (NULL , '2', 'test'), (NULL , '3', 'test'), (NULL , '4', 'test'), (NULL , '5', 'test'), (NULL , '6', 'test'), (NULL , '1', 'test'), (NULL , '1', 'test');

mysql> explain select count(fk) from tester group by fk;
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | tester | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using temporary; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
1 row in set (0.00 sec)

mysql> explain select count(fk) from tester FORCE INDEX FOR GROUP BY (`fk`) group by fk;
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | tester | index | NULL          | fk   | 4       | NULL |   10 |       |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)

Notice the missing "Using Index" from the Extra column.

Suggested fix:
Here is the same query run on an innoDB table.

+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tester | index | NULL          | fk   | 4       | NULL |   10 |   100.00 | Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.02 sec)

GROUP BY, DISTINCT and similar should use available indexes.

Thank you for the report.

Verified as described.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101940

3113 Jonas Oreland	2010-03-01
      ndb - bug#50736 - impl. handler::keys_to_use_for_scanning for more index usage in group/order by

Hi,

'The missing "Using Index" from the Extra column.' does not mean that an index
is not used. It only means that mysqld does not use the "key-read" function-sequence (which ndb does not implement). But, that is not super relevant for ndb
as the data and indexes are in memory, and we use T-trees, which in them self are wo/ data. So type = index as shown when using "force index" is fine.

Patch, to this bug report, changes so that you don't need to perform "force index" to get this behavior.

I would like very much to get feedback if it gives any noticeable performance difference.

/Jonas

pushed to 6.3.33, 7.0.14 and 7.1.2

Documented as follows in the NDB-6.3.33, 7.0.14, and 7.1.2 changelogs:

      GROUP BY queries against NDB tables appeared not to use any indexes.

Closed.

Following discussion with Jonas, revised changelog entry to read as follows:

        A GROUP BY query against an NDB table sometimes did not use
        any indexes unless the query included a FORCE INDEX option. With 
        this fix, indexes are used by such queries (where otherwise 
        possible) even when FORCE INDEX is not specified.

Already closed; status unchanged.

Tested with 7.1.3 on better hardware:

No performance increase.