Bug #81388 min/max optimization ignores group by
Submitted: 12 May 2016 6:32 Modified: 13 May 2016 15:32
Reporter: Trey Raymond Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.29 OS:Any
Assigned to: CPU Architecture:Any

[12 May 2016 6:32] Trey Raymond
Description:
MySQL optimizer can optimize MIN and MAX by looking at the first and last records in a given part of an index.  You can specify a column in the GROUP BY clause to split this into multiple ranges.  However, when there's no column name in the GROUP BY, even if there is one specified, it's completely ignored.  This would happen in a variety of circumstances, the "group by user var" is one that is very well known, so I'm using it as an example here.

Consider the case, you need to perform nontransactional bulk updates/deletes on a large set of records in batches, based on a certain condition.

In this example, the conditions are simplified for ease of testing:
- delete records
- only rows with id<1024
- use batches of 100 rows

* The following works fine, but is bulky, and can lead to very large sql (and large group concat space) with a large chunk size *

mysql> explain extended
    -> select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),"); select sleep(0.1);")
    -> from test
    -> inner join (SELECT @row_counter := 0) sel_row_counter
    -> where id<1024
    -> group by floor(@row_counter/100);
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref  | rows | filtered | Extra                           |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL    | NULL    | NULL |    1 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | test       | range  | PRIMARY       | PRIMARY | 8       | NULL | 1040 |   100.00 | Using where; Using index        |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL    | NULL    | NULL | NULL |     NULL | No tables used                  |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select concat(if((@row_counter:=((@`row_counter`) + 1)),'',''),'delete from test where id in (',group_concat(`trey`.`test`.`id` separator ','),'); select sleep(0.1);') AS `concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),"); select sleep(0.1);")` from `trey`.`test` where (`trey`.`test`.`id` < 1024) group by floor(((@`row_counter`) / 100))
1 row in set (0.00 sec)

mysql> select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),"); select sleep(0.1);")                                                                 -> from test
    -> inner join (SELECT @row_counter := 0) sel_row_counter
    -> where id<1024
    -> group by floor(@row_counter/100)\G
*************************** 1. row ***************************
concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),"); select sleep(0.1);"): delete from test where id in (30,77,140,41,88,151,13,67,130,31,78,141,42,89,152,14,68,131,32,79,142,43,90,153,15,69,132,33,80,143,1,59,122,154,16,70,133,34,81,144,2,60,123,155,17,71,134,35,82,145,3,61,124,156,18,72,135,36,83,146,4,62,125,157,19,73,136,37,84,147,6,63,126,20,74,137,38,85,148,7,64,127,28,75,138,39,86,149,8,65,128,29,76,139,40,87,150,9,66,129); select sleep(0.1);
.....
*************************** 6. row ***************************
concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),"); select sleep(0.1);"): delete from test where id in (757,1023,758,748,759,749,1015,750,1016,751,1017,752,1018,753,1019,754,1020,755,1021,756,1022); select sleep(0.1);
6 rows in set (0.00 sec)

* The following does not work as expected - it completely ignores the group by, and uses min/max on the full filtered row set *

mysql> explain extended
    -> select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),"; select sleep(0.1);")
    -> from test
    -> inner join (SELECT @row_counter := 0) sel_row_counter
    -> where id<1024
    -> group by floor(@row_counter/100);
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref  | rows | filtered | Extra                           |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL    | NULL    | NULL |    1 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | test       | range  | PRIMARY       | PRIMARY | 8       | NULL | 1040 |   100.00 | Using where; Using index        |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL    | NULL    | NULL | NULL |     NULL | No tables used                  |
+----+-------------+------------+--------+---------------+---------+---------+------+------+----------+---------------------------------+
3 rows in set, 1 warning (0.00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select concat(if((@row_counter:=((@`row_counter`) + 1)),'',''),'delete from test where id between ',min(`trey`.`test`.`id`),' and ',max(`trey`.`test`.`id`),'; select sleep(0.1);') AS `concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),"; select sleep(0.1);")` from `trey`.`test` where (`trey`.`test`.`id` < 1024) group by floor(((@`row_counter`) / 100))
1 row in set (0.00 sec)

mysql> select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),"; select sleep(0.1);")
    -> from test
    -> inner join (SELECT @row_counter := 0) sel_row_counter
    -> where id<1024
    -> group by floor(@row_counter/100);
+----------------------------------------------------------------------------------------------------------------------------------------+
| concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),"; select sleep(0.1);") |
+----------------------------------------------------------------------------------------------------------------------------------------+
| delete from test where id between 1 and 1023; select sleep(0.1);                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

How to repeat:
CREATE TABLE `test` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `payload` binary(255) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

insert into test set payload='x';
insert into test (payload) select 'x' from test;
insert into test (payload) select 'x' from test;
insert into test (payload) select 'x' from test;
... get a couple k rows

-- optionally set group_concat_max_len if you have a low value (default is 1024)

select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id in (",group_concat(id),");")
from test
inner join (SELECT @row_counter := 0) sel_row_counter
where id<1024
group by floor(@row_counter/100);

select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),";")
from test
inner join (SELECT @row_counter := 0) sel_row_counter
where id<1024
group by floor(@row_counter/100);

CONTROL (with col name in group by, ignores chunk size and just goes by numeric range, so chunk size will vary greatly):

select concat(if(@row_counter := @row_counter + 1,'',''),"delete from test where id between ",min(id)," and ",max(id),";")
from test
inner join (SELECT @row_counter := 0) sel_row_counter
where id<1024
group by floor(id/100);

Suggested fix:
Enforce any "group by" clause before deciding to use min/max optimization, don't just test for column references.
[13 May 2016 15:32] MySQL Verification Team
Hi Trey,

Thank you very much for your bug report, but that is not a bug.

I would like to direct you to two facts. First, read about, ONLY_FULL_GROUP_BY option, which is by default ON in 5.7.

Second, as it is documented, expressions, including floor(...), can not be optimized in any manner. A solution for this problem is implemented in 5.7, when you can materialize generated virtual columns.

So, not a bug !!!