MySQL Bugs: #96579: Unexpected error message when using an outer reference in HAVING

Bug #96579	Unexpected error message when using an outer reference in HAVING
Submitted:	18 Aug 2019 19:59	Modified:	19 Aug 2019 22:27
Reporter:	Igor Babaev	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Errors	Severity:	S3 (Non-critical)
Version:	5.6/5.7/8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
The query
select a from t1 where t1.a=t1.b group by a having exists (select * from t2 where t1.b=t2.d);
returns an unexpected error message:

mysql> select a from t1 where t1.a=t1.b group by a having exists (select * from t2 where t1.b=t2.d);
ERROR 1054 (42S22): Unknown column 'test.t1.b' in 'where clause'

though a similar query returns the expected result set

mysql> select a,b from t1 where t1.a=t1.b group by a having exists (select * from t2 where t1.b=t2.d);
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
+------+------+

Also a different error message is expected for this query:

mysql> select a from t1 group by a having exists (select * from t2 where t1.b=t2.d);
ERROR 1054 (42S22): Unknown column 'test.t1.b' in 'where clause'

I would expect here the  error with the same number as for the query

mysql> select a,b from t1 group by a having exists (select * from t2 where t1.b=t2.d);
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test.t1.b' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

How to repeat:
Create and populate tab

create table t1 (a int, b int);
create table t2 (c int, d int);
insert into t1 values (1,1), (1,3), (2,1);
insert into t2 values (5,2), (1,3), (5,1);

Run the queries:
select a from t1 where t1.a=t1.b group by a having exists (select * from t2 where t1.b=t2.d);
select a from t1 group by a having exists (select * from t2 where t1.b=t2.d);

Thank you for the bug report.

Posted by developer:
 
This is not a bug. In the first query,

select a from t1 where t1.a=t1.b group by a having exists (select * from t2
where t1.b=t2.d);

The HAVING clause is applied on the table produced by the GROUP BY clause.
That table contains a single column, a, and hence the inner query refers to a
column t1.b that does not exist.

In the second query, the aggregated table does contain columns a and b. This
is usually not allowed under the sql mode only_full_group_by, because b is
neither aggregated (i.e. in the GROUP BY clause) nor the argument of a set
function (such as SUM, AVG, MAX, MIN, etc.) But here the optimizer realized
that b is functionally dependent on a, via the condition t1.a=t1.b. In the
last query, this condition is removed, hence the error.

Hope this helps.

Martin,
Let's a simpler query:
mysql> select a from t1 where t1.a=t1.b group by a having t1.b > t2.d;
ERROR 1054 (42S22): Unknown column 't1.b' in 'having clause'

SQL Standard says:
<having clause> ::= HAVING <search condition>
Syntax Rules:
...
3) Each column reference directly contained in the <search condition> shall be one of the following:
  a) A unambiguous reference to a column that is functionally dependent on column references in GROUP BY clause.
...
t1.b is functionally dependent on t1.a (due to the equality in the WHERE  clause), t1.a is contained in the GROUP BY clause.

Martin,

In my previous comment I wanted to enter:

mysql> select a from t1 where t1.a=t1.b group by a having t1.b > 1;
ERROR 1054 (42S22): Unknown column 't1.b' in 'having clause'