Bug #111189 Scalar correlated subquery with group by clause mistakenly transformed
Submitted: 29 May 2023 15:04 Modified: 9 Aug 2023 0:40
Reporter: chen jiang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: derived table, Scalar subquery

[29 May 2023 15:04] chen jiang
Description:
When the subquery_to_derived parameter is set to on, the following SQL1 will be transformed into SQL2. 

SQL1:
SELECT * FROM t1 WHERE (SELECT COUNT(a) FROM t3 WHERE t3.a = t1.a GROUP BY b) > 0;

==>SQL2:
SELECT `t1`.`a` AS `a`
FROM `t1`
	JOIN (
		SELECT count(`t3`.`a`) AS `COUNT(a)`, `t3`.`a` AS `a`
		FROM `t3`
		GROUP BY `t3`.`b`, `t3`.`a`
		HAVING count(`t3`.`a`) > 0
	) `derived_1_2`
WHERE `derived_1_2`.`a` = `t1`.`a`

If we execute SQL1 and column b has multiple NDV values, the subquery will have multiple rows of results, SQL will throw an error:"Subquery returns more than 1 row". However, when we execute the transformed SQL2, the statement will run normally without capturing any errors.

How to repeat:
CREATE TABLE t1(a INT);
CREATE TABLE t3(a INT, b INT);
INSERT INTO t1 VALUES(1), (2), (3), (4);
INSERT INTO t3 VALUES(1, 3), (2, 3), (1, 4);

SET optimizer_switch = 'subquery_to_derived=on';
SELECT * FROM t1 WHERE (SELECT COUNT(a) FROM t3 WHERE t3.a = t1.a GROUP BY b) > 0;
+------+
| a    |
+------+
|    1 |
|    1 |
|    2 |
+------+
SET optimizer_switch = 'subquery_to_derived=off';
SELECT * FROM t1 WHERE (SELECT COUNT(a) FROM t3 WHERE t3.a = t1.a GROUP BY b) > 0;
ERROR 1242 (21000): Subquery returns more than 1 row
[29 May 2023 15:32] MySQL Verification Team
Hi Mr. jiang,

Thank you for your bug report.

However, this is not a bug. Our Reference Manual has a whole chapter devoted to this option. 

The main problem is that you have not designed your query properly.

It is designed to compare whether a single tuple has a value greater then 0 (zero).

Hence, when it is done as a join (with derived table), you get multiple rows, so the optimiser just adds that condition to the JOIN.

When it is not a join, then you get multiple values to compare with 0 (zero).

Not a bug.
[7 Jun 2023 12:30] MySQL Verification Team
Hi Mr. Jiang,

After an internal discussion, we concluded that this is indeed a bug.

Verified as reported.
[9 Aug 2023 0:40] Jon Stephens
Documented fix as follows in the MySQL 8.2.0 changelog:

    A case was found that was not handled by work done in MySQL
    8.0.24 for transforming correlated scalar subqueries; when the
    scalar subquery is grouped in addition to being correlated, the
    transformation needs to check that--for each partition of the
    result set, as partitioned by the inner expression (columns)
    being added to the group by--there is at most one row in the
    derived table so constructed.

Closed.
[9 Aug 2023 12:50] MySQL Verification Team
Thank you, Jon.