Bug #97742 bad item ref from correlated subquery to outer distinct table
Submitted: 22 Nov 2019 8:22 Modified: 3 Dec 2019 13:11
Reporter: song zhibai (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.7, 8.0 OS:Any
Assigned to: CPU Architecture:x86

[22 Nov 2019 8:22] song zhibai
Description:
explain format=tree SELECT DISTINCT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (0 <> (select #2))
    -> Table scan on <temporary>
        -> Temporary table with deduplication
            -> Table scan on t1  (cost=0.45 rows=2)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Filter: (0 <> t1.f2)
                -> Index scan on t2 using PRIMARY  (cost=1.10 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (1.85 sec)

in here select #2 should not ref t1.f2 field from Filter inside the query to outer distinct t1 sql.

because t1.f2 is inaccessible from select #2 , because in this correlated subquery , select #2 can only access <temporary> table

actually in the resolve_field code , there is a if condition and a comment about the if condition it says :

/*
  Check table fields only if the subquery is used somewhere out of
  HAVING or the outer SELECT does not use grouping (i.e. tables are
  accessible).
  TODO:
  Here we could first find the field anyway, and then test this
  condition, so that we can give a better error message -
  ER_WRONG_FIELD_WITH_GROUP, instead of the less informative
  ER_BAD_FIELD_ERROR which we produce now.
*/

but in if condition , it neglects the fact that distinct can make the table inaccessible , so i think a proper fix is :

-             (!select->with_sum_func && select->group_list.elements == 0))) {
+             (!select->with_sum_func && !select->is_distinct() && select->group_list.elements == 0))) {

in item.cc

How to repeat:
DROP TABLE IF EXISTS t1,t2;

CREATE TABLE t1 (f1 INT, f2 TEXT, f3 INT,
                 PRIMARY KEY(f1));

INSERT INTO t1 VALUES(1,'1', 3);
INSERT INTO t1 VALUES(2,'2', 2);
INSERT INTO t1 VALUES(3,'3', 2);

create table t2 like t1;

explain format=tree SELECT DISTINCT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1) ORDER BY f3;

Suggested fix:
-             (!select->with_sum_func && select->group_list.elements == 0))) {
+             (!select->with_sum_func && !select->is_distinct() && select->group_list.elements == 0))) {
[22 Nov 2019 13:27] Sinisa Milivojevic
Hi Mr. zhibai,

Thank you for your bug report.

It seems to me that it makes lot's of sense. However, I do have a couple of questions regarding your valuable contribution:

* Have you tried using some optimiser hint or similar to get the same effect as the change of code

* What is a difference in speed between the unchanged and changed optimiser code

* When you apply your patch to the code, do all our tests pass without any problem ??? I am enquiring here, both, about our MTR tests and unit tests

Thank you very much in advance ......
[25 Nov 2019 9:42] song zhibai
Hi Milivojevic 
this issue is not a problem of performance , it is a bug of sql validate in optimize stage, here instead of return explain , should return an field not found error.

in this sql :
explain format=tree SELECT DISTINCT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (0 <> (select #2))
    -> Table scan on <temporary>
        -> Temporary table with deduplication
            -> Table scan on t1  (cost=0.45 rows=2)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Filter: (0 <> t1.f2)
                -> Index scan on t2 using PRIMARY  (cost=1.10 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (1.85 sec)

the sql validate process wrongly ref the subquery filter to the table that is not visible to itself , the bug can be also found in the code comment , because in Item_ref:fix_fields in item.cc.

the comment says that only the table(outer t1 table) is accessible to subquery , then the field ref is legal , otherwise , should return field not found error, and distinct in this senario is same with group  .
[25 Nov 2019 14:21] Sinisa Milivojevic
Hi Mr. zhibai,

I have understood what you wrote about this report.

However, according to SQL standard all the columns from the tables in the outer queries are visible to inner queries.

Hence, I do not see your problem.
[26 Nov 2019 2:23] song zhibai
Hi 
I think you misunderstand my issue. 
I don't mean visible in here , actually in dealing with this ref fix fields situation , mysql source code has comment :

Check table fields only if the subquery is used somewhere out of HAVING or the outer SELECT does not use grouping (i.e. tables are accessible).

here mysql is using word : "accessible" 

in this situation , this sql :
 explain format=tree SELECT DISTINCT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (0 <> (select #2))
    -> Table scan on <temporary>
        -> Temporary table with deduplication
            -> Table scan on t1  (cost=0.45 rows=2)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Filter: (0 <> t1.f2)
                -> Index scan on t2 using PRIMARY  (cost=1.10 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (1.85 sec)

this t1.f2 ref from subselect inside is inaccessible , because the outer select is using distinct and this distinct is producing a temporary table and this will make the original table t1 unaccessible (not invisible!) to subselect 

due to the mysql souce code comment , I think it is missing the distinct without group situation and it will only happen in mysql.
[26 Nov 2019 13:32] Sinisa Milivojevic
HI,

First of all, yes, SQL standard stipulates that you can not have HAVING clause without the aggregate query. However, MySQL has always had this extension that HAVING can be used in some situation, even when a query is not the aggregated one.

On the other hand, whether temporary table is used is irrelevant regarding the visibility and accessibility. That is why temporary tables sometimes carry additional columns.
[27 Nov 2019 2:03] song zhibai
Hi 
From this sentence : 
On the other hand, whether temporary table is used is irrelevant regarding the visibility and accessibility. That is why temporary tables sometimes carry additional columns.

I totally understand temporary tables sometimes carry additional columns, and I think it's understandable in Mysql , but this sql , the problem is the ref is not point to the temporary table 
as you can see from the explain tree , the ref is pointing to the original table 

explain format=tree SELECT DISTINCT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (0 <> (select #2))
    -> Table scan on <temporary>
        -> Temporary table with deduplication
            -> Table scan on t1  (cost=0.45 rows=2)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Filter: (0 <> t1.f2)
                -> Index scan on t2 using PRIMARY  (cost=1.10 rows=1)
 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (1.85 sec)

The normal situation for mysql iteself should be :
mysql> explain format=tree select f3 from t1 having (select f3 from t2 having f2 limit 1);
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (0 <> (select #2))
    -> Table scan on t1  (cost=1.20 rows=2)
    -> Select #2 (subquery in condition; dependent)
        -> Limit: 1 row(s)
            -> Filter: (0 <> t1.f2)
                -> Table scan on t2  (cost=0.55 rows=3)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
[27 Nov 2019 13:07] Sinisa Milivojevic
Hi,

Once again, all outer queries are seen by their inner nested queries.

Second, your query has no need for temporary table. Last, but not least, many of the aggregated or distinct queries are resolved without a temporary table.

Hence, for the final time: Not a bug.
[28 Nov 2019 1:31] Øystein Grøvlen
Hi Sinisa!

I agree with you that all "outer queries are seen by their subqueries" and that whether temporary tables are used should not matter.

However, we need also to take into account that not all columns a table may be referred to in the HAVING clause.  For example, using the schema of this bug report, the following query will give an error:

mysql> explain select f3 from t1 having f2;
ERROR 1054 (42S22): Unknown column 'f2' in 'having clause'

This is in my opinion correct since column not used in GROUP BY or select list can not be referred in HAVING clause.

So why should it be possible to refer to f2 in HAVING clause of the outer query by "hiding" it in a subquery:

mysql> EXPLAIN SELECT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | t1    | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    3 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | t2    | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

Note (Code 1276): Field or reference 'f2' of SELECT #2 was resolved in SELECT #1
Note (Code 1003): /* select#1 */ select `test`.`t1`.`f3` AS `f3` from `test`.`t1` having (0 <> (/* select#2 */ select 1 from `test`.`t2` having (0 <> `test`.`t1`.`f2`) limit 1))

Note that EXPLAIN says that it has resolved f2 as f2 of select#1.  However, this is inside the HAVING clause of select#1 where according to previous query, f2 is not known.
[28 Nov 2019 13:18] Sinisa Milivojevic
Hi Øystein Grøvlen,

How are you ??? Long time no see ....

I have learned about SQL standard and the nested queries from the expert. One of the things that I was taught was about dependent nested queries. If you read that part of the standard, the answer will come to you by itself.

Have a nice time.
[3 Dec 2019 0:26] Øystein Grøvlen
Hi Sinisa! 

If the expert you are referring to, is the one I suspect, he has written a book called "SQL-99 Complete, Really", where it says:

"If HAVING includes a subquery, it can't include outer Column references unless those
references are to grouping Columns or are used with a set function."

My point is that this issue is not about dependent nested subqueries in general, but about which columns may be referenced in a HAVING clause.
[3 Dec 2019 13:11] Sinisa Milivojevic
Hi Oystein,

You are right about the expert, that was teacher of mine. Things changed between the standards and I do not know what it is like now.

Anyway, this is a VERY low priority bug, and I will leave it to our SQL standards gurus to see whether it should be fixed or not.

Verified as reported.
[13 Dec 2019 3:41] song zhibai
git diff for fixing

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: diff.txt (text/plain), 1.68 KiB.

[13 Dec 2019 13:09] Sinisa Milivojevic
Thank you for your contribution.
[18 Jan 19:19] Valeriy Kravchuk
Why we do not see 5.7.x among affected versions? based on comments and these outputs:

mysql> EXPLAIN SELECT f3 FROM t1 HAVING (SELECT 1 FROM t2 HAVING f2 LIMIT 1);
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | t1    | NULL       | ALL   | NULL          | NULL    | NULL    | NULL |    3 |   100.00 | NULL        |
|  2 | DEPENDENT SUBQUERY | t2    | NULL       | index | NULL          | PRIMARY | 4       | NULL |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
2 rows in set, 2 warnings (0,00 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'f2' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`t1`.`f3` AS `f3` from `test`.`t1` having (/* select#2 */ select 1 from `test`.`t2` having `test`.`t1`.`f2` limit 1)
2 rows in set (0,00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.7.29    |
+-----------+
1 row in set (0,00 sec)

5.7 suffers from the same problem, essentially.
[20 Jan 12:53] Sinisa Milivojevic
True, only the fix will be much different.