Bug #102844 Correlated Subquery - Ambiguous column used from outer scope
Submitted: 7 Mar 2021 5:29 Modified: 8 Mar 2021 13:42
Reporter: Madhur Bhaiya Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:8.0.23 OS:Any
Assigned to: CPU Architecture:Any
Tags: correlated, subquery

[7 Mar 2021 5:29] Madhur Bhaiya
Description:
When using Correlated subquery in a WHERE clause, it is using outer scope column, even if there is no explicit reference (tablename/alias) provided in the query.

Documentation on Correlated subqueries: https://dev.mysql.com/doc/refman/8.0/en/correlated-subqueries.html does not mention this behaviour as well. Every example there uses specific tablename/alias reference with a column, in the WHERE clause.

How to repeat:
Create two tables. Table t1 has column: col1, while Table t2 has column: col2. SO both the tables have different columns.

create table t1 (id int, col1 int);
insert into t1 (id, col1) values (1,2020), (2, 2021);

create table t2 (id int, col2 int);
insert into t2 (id, col2) values (1,2020), (2, 2021);

Now, when we run the following subquery:

select * from t1 
where not exists (select 1 
                  from t2 
                  where col1 = t1.col1 
                  and col1 = 2023)

It "should" have failed, because there is no col1 in the table t2, and we are not explicitly referencing col1 from t1. On right hand side, there is one explicit reference, which is perfectly fine.

This "bug" repeats with using materialized Derived Tables / JSON_TABLE etc in case of an explicit table t2.

Suggested fix:
I encountered this issue by chance, when an old (and long) reporting query using a Derived table, was optimized to remove few columns. These columns were then used in a Correlated subquery. However, the query did not fail, despite removing those columns. Results definitely got wrong.

col1 = t.col1 becomes a useless condition which results in true always (when t2 has no col1)

What happens if col1 exists in t2. Are we sure that in that case, it will always use the col1 from t2 first ? And, then look in outer scope ?

I am not sure of what ANSI SQL standard says about this, but here is my suggested fix:

Use columns from outer scope ONLY WHEN there is an explicit reference to outer scope tablename/alias.

If not, then documentation must explicitly specify this edge case as a warning to developer, so that he/she always specifies a tablename/alias while referring columns in the WHERE condition of a Correlated subquery.
[7 Mar 2021 5:33] Madhur Bhaiya
Here is db-fiddle reproducing the case: https://www.db-fiddle.com/f/fD1496SNZJkcsx6mNicWmR/1
[8 Mar 2021 13:42] MySQL Verification Team
Hi Mr. Bhaiya,

Thank you for your bug report.

However, this is not a bug. SQL standard clearly specifies this behaviour in sub-chapters 4.15, 6.3 and in the chapter 7. Also, all other major SQL databases have the same behaviour.

Last, but not least, we maintain a Reference Manual, which is not a replacement for SQL handbook, User's Manual or Entity and Query design handbooks.

Not a bug.