Bug #89389 wrong result: select (select aggregate(t1.c) from t2) from t1
Submitted: 24 Jan 2018 19:10 Modified: 26 Jan 2018 7:26
Reporter: Shu Lin Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.5/5.6/5.7/8.0 OS:Any
Assigned to: CPU Architecture:Any

[24 Jan 2018 19:10] Shu Lin
Description:
wrong result due to incorrect aggregate-function-pull-up from subquery to the outer query.

For example
create table test1(c1 int);
insert into test1 values (1), (2);

create table test2(c1 int);
insert into test2 values (10);

select (select max(test1.c1) from test2) from test1

+-----------------------------------+
| (select max(test1.c1) from test2) |
+-----------------------------------+
|                                 2 |
+-----------------------------------+
1 row in set (0.00 sec)

The above query result is wrong. The correct answer should be two rows. Like this: 
|    1 |
|    2 |

It appears that MySQL optimizer incorrectly pulls up the max() processing from the subquery to the outer query.

As a reference, please compare with the following queries (these are correct results).

--------------
select max((select test1.c1 from test2)) from test1
--------------

+-----------------------------------+
| max((select test1.c1 from test2)) |
+-----------------------------------+
|                                 2 |
+-----------------------------------+
1 row in set (0.00 sec)

--------------
select (select max(test1.c1+test2.c1) from test2) from test1
--------------

+--------------------------------------------+
| (select max(test1.c1+test2.c1) from test2) |
+--------------------------------------------+
|                                         11 |
|                                         12 |
+--------------------------------------------+

The problem is not restricted to max(). It appears that all aggrgate functions suffer the same issue.

How to repeat:
create table test1(c1 int);
insert into test1 values (1), (2);

create table test2(c1 int);
insert into test2 values (10);

select (select max(test1.c1) from test2) from test1;
select (select count(test1.c1) from test2) from test1;
[24 Jan 2018 22:03] MySQL Verification Team
Thank you for the bug report. Version affected: 5.5/5.6/5.7/8.0.
[25 Jan 2018 7:59] Knut Anders Hatlen
Posted by developer:
 
FWIW, PostgreSQL seems to agree with MySQL and return one row for these queries:

psql (9.6.6)
Type "help" for help.

db=> create table test1(c1 int);
CREATE TABLE
db=> insert into test1 values (1), (2);
INSERT 0 2
db=> create table test2(c1 int);
CREATE TABLE
db=> insert into test2 values (10);
INSERT 0 1
db=> select (select max(test1.c1) from test2) from test1;
 max 
-----
   2
(1 row)

db=> select (select count(test1.c1) from test2) from test1;
 count 
-------
     2
(1 row)
[25 Jan 2018 9:00] Roy Lyseng
This is correct according to the SQL standard.

The standard says that the aggregation query of a set function specification is generally
the innermost qualifying query of the aggregated column references of the set function.

In this query:

  select (select max(test1.c1) from test2) from test1;

The qualifying query of max(test1.c1) is the outer query block, so t is correct to aggregate the outer query block. The inner query block is thus a non-aggregated, scalar subquery and must return zero or one rows.

In this query:

  select (select max(test1.c1+test2.c1) from test2) from test1;

The set function MAX has two column references from different qualifying query blocks. The standard says that aggregation should occur in the innermost of these, thus aggregation occurs in the inner query block. The aggregation is performed for every row of the outer query block.
[25 Jan 2018 9:02] Guilhem Bichot
MySQL's result is correct, there's no bug.
Considering the reported query:
select (select max(test1.c1) from test2) from test1 ;
in the words of the SQL standard:
- the qualifying query of c1 is, by definition, the query where t1 is in FROM clause: so it's the outer query
- the aggregation query of c1 is, by definition, the innermost qualifying query of c1; as the outer query is the only qualifying query of c1, it's the aggregation query of c1
- so the outer query is aggregated and returns one row; for this row, the subquery is evaluated, and it returns the max.
[25 Jan 2018 14:57] Shu Lin
screen-shot from Oracle online demo database

Attachment: Capture.PNG (image/png, text), 21.29 KiB.

[25 Jan 2018 14:58] Shu Lin
select (select max(test1.c1) from test2) from test1 

From the perspective of the subquery, test1.c1 is an outer reference. Outer reference is treated as constant during the evaluation of the subquery.

The subquery appears at the select clause, that is, it is a "projection".

By SQL standard, since there is no where clause in the outer query, for every row from test1, (select max(test1.c1) from test2) must be evaluated.

Just because max(...) only references test1, doesn't mean that you can take it out of the subquery.

Also, I tried it on the oracle online demo database, and attached the screen shot.
https://livesql.oracle.com/apex/livesql/file/index.html
[25 Jan 2018 15:16] Shu Lin
Based on my comments above, I will change the status from "Not a Bug" to "Open" to initiate further evaluation.
[25 Jan 2018 17:10] Shu Lin
I found a good article that explains well how aggregate functions with outer references are resolved, and it provides some work-around.

http://www.itprotoday.com/aggregates-outer-reference

I will attribute the fact that MySQL and Oracle databases provide difference answers to the same query as "dialect of different DBMS"

As such, I agree this is not a bug of MySQL.