MySQL Bugs: #114435: Incorrect query results caused by the subquery optimization.

Bug #114435	Incorrect query results caused by the subquery optimization.
Submitted:	21 Mar 2024 6:56	Modified:	24 Mar 2024 7:48
Reporter:	John Jove	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S1 (Critical)
Version:	8.0.36	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I run the following test case, in which the SELECT statement is expected to return an error, since subquery returns more than 1 row. Instead, a query result is returned.

How to repeat:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1),(2);
SELECT c1 FROM t1 WHERE !((1 IS UNKNOWN) AND (((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) IN (SELECT c1 FROM t1))); -- actual: {1,2}, expected: error

Hello John,

Thank you for the report and test case.
Verified as described.

regards,
Umesh

Posted by developer:
 
Not a bug.
The optimizer sees that the predicate (1 IS UNKNOWN) is false,
hence it does not have to evaluate the second part of the AND,
and thus it will never see that the subquery evaluation should
lead to an error.

Thanks for the detailed explanation.

However, such evaluation order seems not to be deterministic. It means that the error part of AND clause can be evaluated first sometimes despite the FALSE operand, and an error is retuned.

I construct two interesting test cases as follows.
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1); -- only one row
SELECT c1 FROM t1 WHERE ((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) NOT IN (SELECT c1 FROM t1); -- {1}

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1),(2); -- more than one row
SELECT c1 FROM t1 WHERE ((SELECT c1 FROM t1) NOT BETWEEN 1 AND 2) NOT IN (SELECT c1 FROM t1); -- [21000][1242] Subquery returns more than 1 row

From the above results, this kind of error seems to be checked in the runtime, not statically or grammar-level. Is it correct?

Since the execution plan is laid out by the optimizer, it is quite true that whether to issue an error or not may seem non-deterministic. It is entirely up to the optimizer to determine whether a sub-plan needs to be executed or not, and in which order - thus the term "optimizer".

And the execution is also obviously dependent on the data in the table - one (or zero) rows mean success, more than one row means failure.

Notice also that reporting an error here may be taken as an evidence of bad database design, or unwashed data. Usually a normalized database will use a unique index or similar to ensure that max. one row qualifies to the scalar subquery. Or it will be a set or rows that is aggregated into one row.