Bug #101838 Adding GROUP BY slows down a query.
Submitted: 2 Dec 2020 19:17 Modified: 20 Jul 2021 16:11
Reporter: Jean-François Gagné Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0 OS:Any
Assigned to: Chaithra Gopala Reddy CPU Architecture:Any

[2 Dec 2020 19:17] Jean-François Gagné
Description:
Hi,

I have the following query that looks for the min value of c when p and a are fixed.  The primary key of the table is "p, a, c", without any other columns, and all columns are int (or bigint) unsigned.

select MIN(c) from p where p = @p and a = @a group by p, a;

The problem with above query is that it is slower than I expect, and that the query below is faster when I believe they are equivalent from the SQL point of view (the primary key being "p, a, c").

select MIN(c) from p where p = @p and a = @a;

So adding GROUP BY is making the query slower.  Details in How to repeat.

Thanks for looking into that,

Jean-François Gagné

How to repeat:
mysql [localhost:5731] {msandbox} (test_jfg) > select version();
+-----------+
| version() |
+-----------+
| 5.7.31    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > show create table p\G
*************************** 1. row ***************************
       Table: p
Create Table: CREATE TABLE `p` (
  `p` int(10) unsigned NOT NULL,
  `a` int(10) unsigned NOT NULL,
  `c` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`p`,`a`,`c`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > pager cat > /dev/null
PAGER set to 'cat > /dev/null'
mysql [localhost:5731] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a group by p, a;
1 row in set (0.13 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a;
1 row in set (0.00 sec)

-- And it is not a cache effect, as if I run the queries again, they still take the same time.

mysql [localhost:5731] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a group by p, a;
1 row in set (0.13 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a;
1 row in set (0.00 sec)

-- More details showing that the slower query scans more data than it should.

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a group by p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.11 sec)

Default pager wasn't set, using stdout.
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 1      |
| Handler_read_last     | 0      |
| Handler_read_next     | 631283 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+
7 rows in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.00 sec)

Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)
[2 Dec 2020 20:29] Jean-François Gagné
Hi again,

note that in the more general case, removing the ORDER BY is not enough for making the query fast.  In below, I have a modified version of the query where I return more data, but this data is completely deterministic as they are from the WHERE clause.  When fixing this bug, please also make sure that a query returning columns from the WHERE clause is fast.

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select p, a, MIN(c) from p where p = @p and a = @a group by p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.11 sec)

Default pager wasn't set, using stdout.
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 1      |
| Handler_read_last     | 0      |
| Handler_read_next     | 631283 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+
7 rows in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select p, a, MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.10 sec)

Default pager wasn't set, using stdout.
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 2      |
| Handler_read_last     | 0      |
| Handler_read_next     | 631283 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+
7 rows in set (0.01 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.00 sec)

Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

Thanks, JFG
[2 Dec 2020 20:30] Jean-François Gagné
Hi again another time,

note that my original problematic query is not what I give in the report.  It is on a little more complicated query, using a secondary index instead of the primary key, and involving a VARCHAR, but I was able to simplify it to what I reported, and I think a more simple test-case is better.  Just for completeness, here is the original problematic query.

mysql [localhost:5731] {msandbox} (test_jfg) > show create table p\G
*************************** 1. row ***************************
       Table: p
Create Table: CREATE TABLE `p` (
  `p` int(10) unsigned NOT NULL,
  `a` int(10) unsigned NOT NULL,
  `s` varchar(20) CHARACTER SET utf8mb4 NOT NULL,
  `t` varchar(20) CHARACTER SET utf8mb4 NOT NULL,
  `i` int(10) unsigned DEFAULT NULL,
  `c` bigint(20) unsigned NOT NULL,
  PRIMARY KEY (`p`,`a`,`c`),
  KEY `t` (`t`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin
1 row in set (0.01 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select t, p, a, MIN(c) from p where p = @p and a = @a and t = @t group by t, p, a; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.09 sec)

Default pager wasn't set, using stdout.
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 1      |
| Handler_read_last     | 0      |
| Handler_read_next     | 318142 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+
7 rows in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select t, p, a, MIN(c) from p where p = @p and a = @a and t = @t; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.08 sec)

Default pager wasn't set, using stdout.
+-----------------------+--------+
| Variable_name         | Value  |
+-----------------------+--------+
| Handler_read_first    | 0      |
| Handler_read_key      | 2      |
| Handler_read_last     | 0      |
| Handler_read_next     | 318142 |
| Handler_read_prev     | 0      |
| Handler_read_rnd      | 0      |
| Handler_read_rnd_next | 0      |
+-----------------------+--------+
7 rows in set (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > flush status; pager cat > /dev/null; select MIN(c) from p where p = @p and a = @a and t = @t; pager; show session status like "Handler_read%";
Query OK, 0 rows affected (0.00 sec)

PAGER set to 'cat > /dev/null'
1 row in set (0.00 sec)

Default pager wasn't set, using stdout.
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Handler_read_first    | 0     |
| Handler_read_key      | 1     |
| Handler_read_last     | 0     |
| Handler_read_next     | 0     |
| Handler_read_prev     | 0     |
| Handler_read_rnd      | 0     |
| Handler_read_rnd_next | 0     |
+-----------------------+-------+
7 rows in set (0.00 sec)

Thanks,  JFG
[3 Dec 2020 7:53] MySQL Verification Team
From the first testcase can we see the following outputs ?

set @p:=????? ;     # set accordingly
set @a:=????? ;     # set accordingly

select count(*), 
count(distinct p),
count(distinct a),
count(distinct c),
min(p),
max(p),
min(a),
max(a),
min(c),
max(c),
sum(p=@p),
sum(p<>@p),
sum(a=@a),
sum(a<>@a),
sum(p=@p and a=@a)
from p \G

explain select MIN(c) from p where p = @p and a = @a group by p, a;
explain analyze select MIN(c) from p where p = @p and a = @a group by p, a;
-- -----
explain select MIN(c) from p where p = @p and a = @a;
explain analyze select MIN(c) from p where p = @p and a = @a;
[3 Dec 2020 14:07] Jean-François Gagné
Hi MySQL Verification Team,

> From the first testcase can we see the following outputs ?

I am not sure I understand this question correctly.  I am using session variables to avoid sharing data about this system (same for pager cat > /dev/null), and I think you can guess that my original table structure is a little more descriptive than what I share here.

> set @p:=????? ;     # set accordingly
> set @a:=????? ;     # set accordingly

Yes, these are set, but I will not share the exact data publicly.

> select count(*), [...]

mysql [localhost:5731] {msandbox} (test_jfg) > select count(*),  count(distinct
p), count(distinct a), count(distinct c), sum(p=@p), sum(p<>@p), sum(a=@a), sum(a<>@a), sum(p=@p and a=@a) from p\G
*************************** 1. row ***************************
          count(*): 10859545
 count(distinct p): 4521116
 count(distinct a): 293
 count(distinct c): 8389543
         sum(p=@p): 631401
        sum(p<>@p): 10228144
         sum(a=@a): 631283
        sum(a<>@a): 10228262
sum(p=@p and a=@a): 631283
1 row in set (15.21 sec)

> explain [...]

I cannot run explain analyze on 5.7.31, below are the requested explains on 5.7.31.

mysql [localhost:5731] {msandbox} (test_jfg) > explain select MIN(c) from p where p = @p and a = @a group by p, a;
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref         | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | ref  | PRIMARY       | PRIMARY | 8       | const,const | 1321606 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:5731] {msandbox} (test_jfg) > explain select MIN(c) from p where p = @p and a = @a;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

And the four explains on 8.0.22.

mysql [localhost:8022] {msandbox} (test_jfg) > explain select MIN(c) from p where p = @p and a = @a group by p, a;
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref         | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
|  1 | SIMPLE      | p     | NULL       | ref  | PRIMARY       | PRIMARY | 8       | const,const | 1321606 |   100.00 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test_jfg) > explain analyze select MIN(c) from p where p = @p and a = @a group by p, a;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group aggregate: min(p.c)  (actual time=204.201..204.201 rows=1 loops=1)
    -> Index lookup on p using PRIMARY (p=(@p), a=(@a))  (cost=137214.90 rows=1321606) (actual time=0.030..155.410 rows=631283 loops=1)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.21 sec)

mysql [localhost:8022] {msandbox} (test_jfg) > explain select MIN(c) from p where p = @p and a = @a;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test_jfg) > explain analyze select MIN(c) from p where p = @p and a = @a;
+-------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                         |
+-------------------------------------------------------------------------------------------------+
| -> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
 |
+-------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Note that I did not test with 5.7.32 as I am doing my tests on MacOS, but I can try reproducing on Linux on 5.7.32 if you want.
[6 Dec 2020 18:54] Jean-François Gagné
I originally reported this issue as "Adding ORDER BY slows down a query", I should have named it "Adding GROUP BY slows down a query".  Combined with this comment, I updated the title.

I also tested with 5.7.32 and I was able to reproduce this issue (running query twice below to show no cache effect).

mysql [localhost:5732] {msandbox} (test_jfg) > select version();
+-----------+
| version() |
+-----------+
| 5.7.32    |
+-----------+
1 row in set (0.01 sec)

mysql [localhost:5732] {msandbox} (test_jfg) > pager cat > /dev/null
PAGER set to 'cat > /dev/null'
mysql [localhost:5732] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a group by p, a;select MIN(c)
from p where p = @p and a = @a group by p, a;
1 row in set (0.24 sec)

1 row in set (0.30 sec)

mysql [localhost:5732] {msandbox} (test_jfg) > select MIN(c) from p where p = @p and a = @a;select MIN(c) from p where p = @p and a = @a;
1 row in set (0.00 sec)

1 row in set (0.00 sec)
[9 Dec 2020 13:52] MySQL Verification Team
Hi Jean-François Gagné,

Thank you for your bug report.

However, we do not think that this is a bug.

Aggregated queries can never have the same speed as non-aggregated queries. You can increase a speed a bit, by adding ORDER BY NULL. However, it will still be much slower than the same query that does not have to be aggregated. Aggregated queries require so much more processing. What is even worse, aggregated queries have their own path of execution. This is described in some SQL textbooks. Also, look carefully at the optimiser trace and explain, including the one with ANALYZE.

A way could be found, perhaps, to make those queries execute faster, but never as fast as straight, filtering-only queries.

If you find a way to make aggregated queries, on a sizeable data set, as fast or faster than non-aggregated queries, then we would happily make this a feature request.

But, so far, this is not a bug.
[11 Dec 2020 13:11] Steinar Gunderson
For the old optimizer, it should probably have been fixed by

    Bug #18109609: LOOSE INDEX SCAN IS NOT USED WHEN IT SHOULD

I'm assigning this to Chaithra for further analysis.
[11 Dec 2020 14:39] Jean-François Gagné
Thanks for your actions Steinar that made this report progress from "Not a Bug"
 to "Verified".

> For the old optimizer, it should probably have been fixed by
>
>    Bug #18109609: LOOSE INDEX SCAN IS NOT USED WHEN IT SHOULD

I initially reported this for versions 5.7.31, 5.7.32 and 8.0.22 with reproductions steps for each of these versions.  This has been changed to only 8.0. I am not sure to understand the two lines quoted above, and I cannot see Bug#18109609.  Does this means it is fixed in the next 5.7 version ?

Thanks again for reconsidering this report.
[20 Jul 2021 15:42] Jon Stephens
Documented fix in the MySQL 8.0.27 changelog as follows:

    While evaluating a loose index scan as a possible option for
    performing grouping and finding the minimum value, the cost
    calculation did not reflect the fact that the query looked at
    one group only, due to the equality predicates on the grouping
    attributes. This resulted in examination of additional rows
    since grouping is performed after reading the rows from the
    index.

    We fix this by determining whether a query produces only one
    group by checking for the presence of equality predicates on
    grouping attributes and using these for calculating the cost.
    This causes the optimizer to pick loose index scan for such
    cases when doing so is found to be beneficial.

Closed.
[20 Jul 2021 16:11] Jean-François Gagné
Thanks for the fix Jon.

I understand this is fixed in 8.0, but the initial reproduction was with 5.7 (I do not know how the version ended-up to be 8.0), can we hope for a fix in 5.7 ?  Thanks.
[21 Jul 2021 11:34] MySQL Verification Team
Thank you, Jon ........