MySQL Bugs: #89389: wrong result: select (select aggregate(t1.c) from t2) from t1

Bug #89389	wrong result: select (select aggregate(t1.c) from t2) from t1
Submitted:	24 Jan 2018 19:10	Modified:	26 Jan 2018 7:26
Reporter:	Shu Lin	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.5/5.6/5.7/8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
wrong result due to incorrect aggregate-function-pull-up from subquery to the outer query.

For example
create table test1(c1 int);
insert into test1 values (1), (2);

create table test2(c1 int);
insert into test2 values (10);

select (select max(test1.c1) from test2) from test1

+-----------------------------------+
| (select max(test1.c1) from test2) |
+-----------------------------------+
|                                 2 |
+-----------------------------------+
1 row in set (0.00 sec)

The above query result is wrong. The correct answer should be two rows. Like this: 
|    1 |
|    2 |

It appears that MySQL optimizer incorrectly pulls up the max() processing from the subquery to the outer query.

As a reference, please compare with the following queries (these are correct results).

--------------
select max((select test1.c1 from test2)) from test1
--------------

+-----------------------------------+
| max((select test1.c1 from test2)) |
+-----------------------------------+
|                                 2 |
+-----------------------------------+
1 row in set (0.00 sec)

--------------
select (select max(test1.c1+test2.c1) from test2) from test1
--------------

+--------------------------------------------+
| (select max(test1.c1+test2.c1) from test2) |
+--------------------------------------------+
|                                         11 |
|                                         12 |
+--------------------------------------------+

The problem is not restricted to max(). It appears that all aggrgate functions suffer the same issue.

How to repeat:
create table test1(c1 int);
insert into test1 values (1), (2);

create table test2(c1 int);
insert into test2 values (10);

select (select max(test1.c1) from test2) from test1;
select (select count(test1.c1) from test2) from test1;

Thank you for the bug report. Version affected: 5.5/5.6/5.7/8.0.

Posted by developer:
 
FWIW, PostgreSQL seems to agree with MySQL and return one row for these queries:

psql (9.6.6)
Type "help" for help.

db=> create table test1(c1 int);
CREATE TABLE
db=> insert into test1 values (1), (2);
INSERT 0 2
db=> create table test2(c1 int);
CREATE TABLE
db=> insert into test2 values (10);
INSERT 0 1
db=> select (select max(test1.c1) from test2) from test1;
 max 
-----
   2
(1 row)

db=> select (select count(test1.c1) from test2) from test1;
 count 
-------
     2
(1 row)

This is correct according to the SQL standard.

The standard says that the aggregation query of a set function specification is generally
the innermost qualifying query of the aggregated column references of the set function.

In this query:

  select (select max(test1.c1) from test2) from test1;

The qualifying query of max(test1.c1) is the outer query block, so t is correct to aggregate the outer query block. The inner query block is thus a non-aggregated, scalar subquery and must return zero or one rows.

In this query:

  select (select max(test1.c1+test2.c1) from test2) from test1;

The set function MAX has two column references from different qualifying query blocks. The standard says that aggregation should occur in the innermost of these, thus aggregation occurs in the inner query block. The aggregation is performed for every row of the outer query block.

MySQL's result is correct, there's no bug.
Considering the reported query:
select (select max(test1.c1) from test2) from test1 ;
in the words of the SQL standard:
- the qualifying query of c1 is, by definition, the query where t1 is in FROM clause: so it's the outer query
- the aggregation query of c1 is, by definition, the innermost qualifying query of c1; as the outer query is the only qualifying query of c1, it's the aggregation query of c1
- so the outer query is aggregated and returns one row; for this row, the subquery is evaluated, and it returns the max.

screen-shot from Oracle online demo database

Attachment: Capture.PNG (image/png, text), 21.29 KiB.

select (select max(test1.c1) from test2) from test1 

From the perspective of the subquery, test1.c1 is an outer reference. Outer reference is treated as constant during the evaluation of the subquery.

The subquery appears at the select clause, that is, it is a "projection".

By SQL standard, since there is no where clause in the outer query, for every row from test1, (select max(test1.c1) from test2) must be evaluated.

Just because max(...) only references test1, doesn't mean that you can take it out of the subquery.

Also, I tried it on the oracle online demo database, and attached the screen shot.
https://livesql.oracle.com/apex/livesql/file/index.html

Based on my comments above, I will change the status from "Not a Bug" to "Open" to initiate further evaluation.

I found a good article that explains well how aggregate functions with outer references are resolved, and it provides some work-around.

http://www.itprotoday.com/aggregates-outer-reference

I will attribute the fact that MySQL and Oracle databases provide difference answers to the same query as "dialect of different DBMS"

As such, I agree this is not a bug of MySQL.