Bug #103869 Duplicate computation of aggregate function when referred in HAVING clause
Submitted: 31 May 2021 14:29 Modified: 2 Jun 2021 17:08
Reporter: Øystein Grøvlen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.24, 8.0.25 OS:Any
Assigned to: CPU Architecture:Any

[31 May 2021 14:29] Øystein Grøvlen
Description:
If an aggregate function is referred both in the SELECT list and in the HAVING clause, it is computed twice:

mysql [localhost:8024] {msandbox} (world) > explain format=tree select countrycode, sum(population) from city group by countrycode having sum(population) > 10000000\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (sum(city.Population) > 10000000)
    -> Group aggregate: sum(city.Population), sum(city.Population)
        -> Index scan on city using CountryCode  (cost=414.15 rows=4079)

Note the line form "Group aggregate".  This does not happen for ORDER BY:

mysql [localhost:8024] {msandbox} (world) > explain format=tree select countrycode, sum(population) from city group by countrycode order by  sum(population)\G
*************************** 1. row ***************************
EXPLAIN: -> Sort: `sum(population)`
    -> Stream results
        -> Group aggregate: sum(city.Population)
            -> Index scan on city using CountryCode  (cost=414.15 rows=4079)

For the following variant of TPC-H Q18 (scale factor 10), this leads to a 20% increase in execution time:

SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, l2.qty
FROM customer, orders,
     ( SELECT l_orderkey, SUM(l_quantity) qty
       FROM lineitem
       GROUP BY l_orderkey
       HAVING SUM(l_quantity) > 312
     ) l2
WHERE o_orderkey = l2.l_orderkey AND c_custkey = o_custkey
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
ORDER BY o_totalprice DESC, o_orderdate
LIMIT 100;

compared to

SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, l2.qty
FROM customer, orders,
     ( SELECT l_orderkey, SUM(l_quantity) qty
       FROM lineitem
       GROUP BY l_orderkey
       HAVING qty > 312
     ) l2
WHERE o_orderkey = l2.l_orderkey AND c_custkey = o_custkey
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
ORDER BY o_totalprice DESC, o_orderdate
LIMIT 100;

How to repeat:
explain format=tree select countrycode, sum(population) from city group by countrycode having sum(population) > 10000000\G

explain format=tree select countrycode, sum(population) from city group by countrycode order by sum(population)\G
[1 Jun 2021 12:39] MySQL Verification Team
Hi Mr. Øystein Grøvlen,

Thank you very much for your bug report.

It does sound like a very interesting performance improvement report, but we must be able to repeat that behaviour, with 8.0.25.

Can you be so kind and send us a repeatable test case ???

Thanks a lot in advance.
[1 Jun 2021 12:57] Øystein Grøvlen
AFAICT, the only difference between 8.0.24 and 8.0.25 is 

    Bug#32717969: Wrong result of execute prepare on 'smallint'

Do you expect that to have fixed this issue?
[1 Jun 2021 13:13] MySQL Verification Team
Hi,

It is very hard to say, so a test case is truly important.
[1 Jun 2021 14:28] Øystein Grøvlen
I can confirm that the issue also exists in 8.0.25
[2 Jun 2021 12:19] MySQL Verification Team
Hi Mr. Grovlen,

We truly believe you, but we do require a reproducible test case.
[2 Jun 2021 15:14] Øystein Grøvlen
So the original test case in "How-to-repeat" is not not sufficient?  EXPLAIN shows that the sum function is computed twice.
[2 Jun 2021 17:07] Steinar Gunderson
The test case is sufficient.