Bug #89870 Group by optimization not used with partitioned tables
Submitted: 1 Mar 14:21 Modified: 1 Mar 17:22
Reporter: Arnaud Adant Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Partitions Severity:S4 (Feature request)
Version:5.7.18 OS:Any
Assigned to: CPU Architecture:Any

[1 Mar 14:21] Arnaud Adant
Description:
using index for group-by should work when one partition partition is full scanned

It works on the full partitioned table :

MySQL [test]> explain select  d from t group by d;
+----+-------------+-------+---------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions    | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+---------------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | t     | p1,p2,p3,pmax | range | d             | d    | 4       | NULL |   16 |   100.00 | Using index for group-by |
+----+-------------+-------+---------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

When a partition is selected :

MySQL [test]> explain select  d from t partition(p2) group by d;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
|  1 | SIMPLE      | t     | p2         | range | d             | d    | 4       | NULL |    6 |   100.00 | Using index for group-by |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

But not when a condition leading to full partition selection is passed :

MySQL [test]> explain select  d from t where c >= 1 and c < 2 group by d;
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | t     | p2         | index | d             | d    | 4       | NULL | 19420 |    11.11 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

How to repeat:
drop table if exists t;
CREATE TABLE t (
id int NOT NULL auto_increment,
c int not null,
d int not null,
PRIMARY KEY (id,c),
key(d)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS(c)
(
PARTITION p1 VALUES LESS THAN (1),
PARTITION p2 VALUES LESS THAN (2),
PARTITION p3 VALUES LESS THAN (3),
PARTITION pmax VALUES LESS THAN (MAXVALUE)
);

insert into t(c,d) values(rand()*3, rand()*4);
insert into t(c,d) values(rand()*3, rand()*4);
insert into t(c,d) values(rand()*3, rand()*4);

replace into t(c,d) select rand()*3, rand()*4 from t t1, t t2, t t3, t t4, t t5, t t6, t t7, t t8, t t9, t t10;
analyze table t;
explain select  d from t group by d;
explain select  d from t where c >= 1 and c < 2 group by d;
explain select  d from t partition(p2) group by d;

Suggested fix:
Detect that the where clause is the same as selection the full partition.
[1 Mar 15:50] Richard Stracke
For using group by optimization, the range column must be included in the index order.

With

CREATE TABLE t (
id int NOT NULL auto_increment,
c int not null,
d int not null,
PRIMARY KEY (id,c),
key(d,c)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS(c)
(
PARTITION p1 VALUES LESS THAN (1),
PARTITION p2 VALUES LESS THAN (2),
PARTITION p3 VALUES LESS THAN (3),
PARTITION pmax VALUES LESS THAN (MAXVALUE)
);

explain select  d from t where c >= 1 and c < 2 group by d,c;

give

MariaDB [forum]> explain select  d from t where c >= 1 and c < 2 group by d,c;
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
| id   | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                                 |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | t     | range | NULL          | d    | 8       | NULL |   10 | Using where; Using index for group-by |
+------+-------------+-------+-------+---------------+------+---------+------+------+---------------------------------------+
[1 Mar 17:22] Arnaud Adant
Agreed, if it is documented, then it is a feature request at best.

One use case is when you have an existing index on a large table and you want to use it as it is without adding another index containing the partitioning key.
[5 Mar 10:30] Valeriy Kravchuk
I see no good reason for 

select  d from t where c >= 1 and c < 2 group by d;

to work differently than:

select  d from t partition(p2) group by d;

At least, I do not see this reason clearly documented at https://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html that says nothing about partitioned tables at all.
[7 Mar 13:26] Richard Stracke
The same happened without partition.

I removed partition , and 

CREATE TABLE t (
id int NOT NULL auto_increment,
c int not null,
d int not null,
PRIMARY KEY (id,c),
key(d)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
;

insert into t(c,d) values(rand()*3,1);
insert into t(c,d) values(rand()*3,1);
insert into t(c,d) values(rand()*3,1);

replace into t(c,d) select rand()*3, rand()*4 from t t1, t t2, t t3, t t4, t t5, t t6, t t7, t t8, t t9, t t10;
analyze table t;

The explain result are the same.

In the current stage, group by optimization not work without adding the index.

The only solution would be, that the optimizer is able to check, if the condition in the where clause include the whole table (or partition) and in this case use group by optimization.