Bug #50736 GROUP BY does not use index on ndbcluster tables
Submitted: 29 Jan 2010 15:55 Modified: 27 Apr 2010 14:26
Reporter: Casey Brown Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: Jonas Oreland CPU Architecture:Any
Tags: 5.1.32-ndb-7.0.5-cluster-gpl, 7.0.9, force index, GROUP BY, MySQL Cluster, ndb

[29 Jan 2010 15:55] Casey Brown
Description:
It appears that ndb tables will not use an index for GROUP BY operations, even if FORCE INDEX is used.

This results in temporary tables and filesort for even simple aggregate queries.

How to repeat:
CREATE TABLE `tester` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT ,
`fk` INT( 11 ) NOT NULL ,
`data` VARCHAR( 11 ) NOT NULL ,
PRIMARY KEY ( `id` ) ,
INDEX ( `fk` )
) ENGINE = NDBCLUSTER;
 
INSERT INTO `tester` (`id` ,`fk` ,`data`)
VALUES (NULL , '1', 'test'), (NULL , '2', 'test'), (NULL , '1', 'test'), (NULL , '2', 'test'), (NULL , '3', 'test'), (NULL , '4', 'test'), (NULL , '5', 'test'), (NULL , '6', 'test'), (NULL , '1', 'test'), (NULL , '1', 'test');

mysql> explain select count(fk) from tester group by fk;
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | Extra                           |
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
|  1 | SIMPLE      | tester | ALL  | NULL          | NULL | NULL    | NULL |   10 | Using temporary; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+------+---------------------------------+
1 row in set (0.00 sec)

mysql> explain select count(fk) from tester FORCE INDEX FOR GROUP BY (`fk`) group by fk;
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
|  1 | SIMPLE      | tester | index | NULL          | fk   | 4       | NULL |   10 |       |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)

Notice the missing "Using Index" from the Extra column.

Suggested fix:
Here is the same query run on an innoDB table.

+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | tester | index | NULL          | fk   | 4       | NULL |   10 |   100.00 | Using index |
+----+-------------+--------+-------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.02 sec)

GROUP BY, DISTINCT and similar should use available indexes.
[1 Feb 2010 9:28] Sveta Smirnova
Thank you for the report.

Verified as described.
[1 Mar 2010 19:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101940

3113 Jonas Oreland	2010-03-01
      ndb - bug#50736 - impl. handler::keys_to_use_for_scanning for more index usage in group/order by
[1 Mar 2010 19:50] Jonas Oreland
Hi,

'The missing "Using Index" from the Extra column.' does not mean that an index
is not used. It only means that mysqld does not use the "key-read" function-sequence (which ndb does not implement). But, that is not super relevant for ndb
as the data and indexes are in memory, and we use T-trees, which in them self are wo/ data. So type = index as shown when using "force index" is fine.

Patch, to this bug report, changes so that you don't need to perform "force index" to get this behavior.

I would like very much to get feedback if it gives any noticeable performance difference.

/Jonas
[1 Mar 2010 20:39] Jonas Oreland
pushed to 6.3.33, 7.0.14 and 7.1.2
[3 Mar 2010 5:24] Jon Stephens
Documented as follows in the NDB-6.3.33, 7.0.14, and 7.1.2 changelogs:

      GROUP BY queries against NDB tables appeared not to use any indexes.

Closed.
[3 Mar 2010 10:52] Jon Stephens
Following discussion with Jonas, revised changelog entry to read as follows:

        A GROUP BY query against an NDB table sometimes did not use
        any indexes unless the query included a FORCE INDEX option. With 
        this fix, indexes are used by such queries (where otherwise 
        possible) even when FORCE INDEX is not specified.

Already closed; status unchanged.
[27 Apr 2010 14:26] Casey Brown
Tested with 7.1.3 on better hardware:

No performance increase.