Bug #113319 unnesting scalar subquery containing count and having cond produces wrong result
Submitted: 4 Dec 2023 6:58 Modified: 25 Jul 2024 16:34
Reporter: chen jiang Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:mysql-8.0.33, 8.0.35, 8.1.0, 8.2.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: count, Scalar subquery, unnest, wrong result

[4 Dec 2023 6:58] chen jiang
Description:
In the rule for unnesting scalar subqueries, if the subquery contains a count aggregate func and a having condition, an incorrect rewrite has occurred, resulting in an erroneous result set.
e.g.
set optimizer_switch='subquery_to_derived=on';
select (select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0) from t1;
=>
select coalesce(`derived_1_2`.`Name_exp_1`,0) AS `(select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0)` from `test`.`t1` left join (/* select#2 */ select count(1) AS `Name_exp_1`,`test`.`t2`.`a` AS `Name_exp_2` from `test`.`t2` group by `Name_exp_2` having (`Name_exp_1` > 0)) `derived_1_2` on((`derived_1_2`.`Name_exp_2` = `test`.`t1`.`a`))

MySQL cannot distinguish whether the NULL value of the input parameter for coalesce is produced by the having condition or is filled in when a left join fails to match. Simplistically turning it into 0 results in an incorrect result set.

How to repeat:
CREATE TABLE t1(a INT);
CREATE TABLE t2(a INT);
INSERT INTO t1 VALUES(1), (2), (3), (4);
INSERT INTO t2 VALUES(1), (2);

set optimizer_switch='subquery_to_derived=off';
select (select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0) from t1;
(select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0)
1
1
NULL
NULL
set optimizer_switch='subquery_to_derived=on';
select (select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0) from t1;
(select count(1) as cnt from t2 where t2.a = t1.a  having cnt > 0)
1
1
0
0
[4 Dec 2023 7:24] MySQL Verification Team
Hello chen jiang,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[25 Jul 2024 16:34] Jon Stephens
Documented fix as follows in the MySQL 9.0.0 changelog:

    A query such as the following:

SELECT 
( 
SELECT COUNT(1) AS cnt
FROM t2
WHERE t2.a = t1.a
HAVING cnt > 0
) 
FROM t1;

    was transformed to this:

SELECT COALESCE(derived_1_2.cnt,0) AS cnt
FROM t1
LEFT JOIN
(
SELECT COUNT(1) AS cnt,
t2.a AS a from t2
GROUP BY t2.a
HAVING (cnt > 0)
) AS derived_1_2
ON derived_1_2.a = t1.a;

    The presence of a false HAVING condition in the subquery should
    semantically change the correct result of the scalar subquery
    from zero to NULL, which happened as expected for the original
    query, but not in the transformed case.

Closed.