Bug #114435 Incorrect query results caused by the subquery optimization.
Submitted: 21 Mar 6:56 Modified: 24 Mar 7:48
Reporter: John Jove Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0.36 OS:Any
Assigned to: CPU Architecture:Any

[21 Mar 6:56] John Jove
Description:
I run the following test case, in which the SELECT statement is expected to return an error, since subquery returns more than 1 row. Instead, a query result is returned.

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1),(2);
SELECT c1 FROM t1 WHERE !((1 IS UNKNOWN) AND (((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) IN (SELECT c1 FROM t1))); -- actual: {1,2}, expected: error
[21 Mar 7:34] MySQL Verification Team
Hello John,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[22 Mar 7:17] Roy Lyseng
Posted by developer:
 
Not a bug.
The optimizer sees that the predicate (1 IS UNKNOWN) is false,
hence it does not have to evaluate the second part of the AND,
and thus it will never see that the subquery evaluation should
lead to an error.
[24 Mar 7:48] John Jove
Thanks for the detailed explanation.

However, such evaluation order seems not to be deterministic. It means that the error part of AND clause can be evaluated first sometimes despite the FALSE operand, and an error is retuned.

I construct two interesting test cases as follows.
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1); -- only one row
SELECT c1 FROM t1 WHERE ((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) NOT IN (SELECT c1 FROM t1); -- {1}

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1),(2); -- more than one row
SELECT c1 FROM t1 WHERE ((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) NOT IN (SELECT c1 FROM t1); -- [21000][1242] Subquery returns more than 1 row

From the above results, this kind of error seems to be checked in the runtime, not statically or grammar-level. Is it correct?
[3 Apr 6:18] Roy Lyseng
Since the execution plan is laid out by the optimizer, it is quite true that whether to issue an error or not may seem non-deterministic. It is entirely up to the optimizer to determine whether a sub-plan needs to be executed or not, and in which order - thus the term "optimizer".

And the execution is also obviously dependent on the data in the table - one (or zero) rows mean success, more than one row means failure.

Notice also that reporting an error here may be taken as an evidence of bad database design, or unwashed data. Usually a normalized database will use a unique index or similar to ensure that max. one row qualifies to the scalar subquery. Or it will be a set or rows that is aggregated into one row.