Bug #105526 Query result is incompatible with the SQL standard
Submitted: 11 Nov 2021 9:00 Modified: 15 Nov 2021 15:14
Reporter: Yukun Liang Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[11 Nov 2021 9:00] Yukun Liang
Description:
In SQL standard, the execution order of query should be:
1. from clause
2. joined table
3. where clause
4. group by clause
5. having clause
6. query specification (ie. SELECT)

but in MySQL server 8.0.18, when I execute
"explain format=tree select distinct sum(c)*sum(b) from t1 group by a having sum(a) > 1;"
I got the result:
-> Filter: (sum(t1.a) > 1)
    -> Table scan on <temporary>
        -> Temporary table with deduplication # process DISTINCT here
            -> Table scan on <temporary>
                -> Aggregate using temporary table
                    -> Table scan on t1  (cost=0.65 rows=4)
seems MySQL apply filter after SELECT DISTINCT

How to repeat:
create table t1 (a int, b int, c int);
insert into t1 values(1, 7, 101);
insert into t1 values(2, 7, 101);
insert into t1 values(3, 11, 102);
insert into t1 values(4, 11, 102);
select distinct sum(c)*sum(b) from t1 group by a having sum(a) > 1;

Result is:
+---------------+
| sum(c)*sum(b) |
+---------------+
|          1122 |
+---------------+

If we rewrite query to:
"select distinct col from (select sum(c)*sum(b) as col from t1 group by a having sum(a) > 1) tmp;

The result is:
+------+
| col  |
+------+
|  707 |
| 1122 |
+------+
"

Suggested fix:
Change plan to:
-> Table scan on <temporary>
    -> Temporary table with deduplication
        -> Filter: (sum(t1.a) > 1) # Move filter to here
            -> Table scan on <temporary>
                -> Aggregate using temporary table
                    -> Table scan on t1  (cost=0.65 rows=4)
[11 Nov 2021 13:44] MySQL Verification Team
Hi Mr. Liang,

Thank you for your bug report.

However, this is not a bug.

First of all, your quoting of the SQL standard is correct, but you did not apply fully SQL standards to your examples.

What you have , in both cases, is HAVING clause that does not conform to SQL standard, since it is referencing an expression that is not one of the aggregating columns / expressions. 

Next, one of the queries has a nested query, while the other one does not. Last, but not least, applying 'DISTINCT` reserved word on the query, changes the order of the evaluation. It also slows down, totally unnecessary, the entire query since the aggregated queries are, by definition and as per SQL standard, distinct queries.

Not a bug.
[11 Nov 2021 21:08] Roy Lyseng
The reporter's interpretation of the SQL standard is correct.
The HAVING clause contains an aggregation of columns from the source tables, which is allowed.
The bug verifies on 5.7 up to 8.0.27.
[15 Nov 2021 15:14] MySQL Verification Team
Hi Mr. Liang,

We had some long internal discussions, after which we concluded that your report should be verified as a bug. There are still some discussions on how to fix this problem, but there is no doubt that this is a bug.

Verified as reported.