Bug #13874 Incorrect subquery handling in "exists"
Submitted: 9 Oct 2005 16:23 Modified: 27 Oct 2006 0:42
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.14, 5.0.13, 5.0.15-BK OS:Linux (Linux, Windows, FreeBSD)
Assigned to: Igor Babaev CPU Architecture:Any
Tags: qc

[9 Oct 2005 16:23] Martin Friebe
Description:
I believe the following is a bug. (It might be considered documentation, see "suggested fix")

In the sql the most inner subquery, returns 2 rows for the field f1. This should cause an error in the subquery.

From http://dev.mysql.com/doc/mysql/en/subquery-errors.html
"It's okay to use a subquery that returns multiple columns, if the purpose is comparison. See Section 13.2.8.5, “Row Subqueries”. But in other contexts, the subquery must be a scalar operand."

But besause "exists" optimizes the field list away (unless forced by a refernce to the filed, as in the 2nd example), the subquery will not be executed and does not cause the error. (explain will even without having, claim that it gets executed)

the error also occurs using real tables, instead of DUAL. It also appears, with exists beeing part of a where clause, rather than the result set.

How to repeat:
select exists (select (select 1 union select 2) as f1 from DUAL ) from DUAL;
+------------------------------------------------------------+
| exists (select (select 1 union select 2) as f1 from DUAL ) |
+------------------------------------------------------------+
|                                                          1 |
+------------------------------------------------------------+

select exists (select (select 1 union select 2) as f1 from (select 0) x having f1 = 0 ) from DUAL;
ERROR 1242 (21000): Subquery returns more than 1 row

explain select exists (select (select 1 union select 2) as f1 from DUAL ) from DUAL;
+----+--------------+------------+------+---------------+------+---------+------+------+----------------+
| id | select_type  | table      | type | possible_keys | key  | key_len | ref  | rows | Extra          |
+----+--------------+------------+------+---------------+------+---------+------+------+----------------+
|  1 | PRIMARY      | NULL       | NULL | NULL          | NULL |    NULL | NULL | NULL | No tables used |
|  2 | SUBQUERY     | NULL       | NULL | NULL          | NULL |    NULL | NULL | NULL | No tables used |
|  3 | SUBQUERY     | NULL       | NULL | NULL          | NULL |    NULL | NULL | NULL | No tables used |
|  4 | UNION        | NULL       | NULL | NULL          | NULL |    NULL | NULL | NULL | No tables used |
|NULL | UNION RESULT | <union3,4> | ALL  | NULL          | NULL |    NULL | NULL | NULL |                |
+----+--------------+------------+------+---------------+------+---------+------+------+----------------+
5 rows in set (0.00 sec)

Suggested fix:
The current documentation about "exists" suggests that the inner subquery is ignored and not executed.
the explain suggests that only its result is ignored.

If the subquery is indeed executed, I believe the error should be reported.

If the subquery is not executed, but optimized away, I believe it should be documented, as an exception from the error. (maybe unless in strict sql mode?).
in the qoute above "if the purpose is comparison, or the resultset to be ignored"

There might be a few application, which could benefit from an error here, but most applications will probably benefit from an optimized execution time.
[11 Oct 2005 13:51] Valeriy Kravchuk
I see no problems at all with your first query. There is a separate section of the manual explaining EXISTS behaviour (http://dev.mysql.com/doc/mysql/en/exists-and-not-exists-subqueries.html):

"If a subquery returns any rows at all, then EXISTS subquery is TRUE, and NOT EXISTS subquery is FALSE. For example:

SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Traditionally, an EXISTS subquery starts with SELECT *, but it could begin with SELECT 5 or SELECT column1 or anything at all. MySQL ignores the SELECT list in such a subquery, so it doesn't matter."

So, subquery in exists clause can return any number of rows, and it is documented. What do you think about these?
[11 Oct 2005 14:13] Martin Friebe
Well in first I quoted the wrong bit of the manual about subqueries returning multiple columns, and subqueries returning multiple rows.

But here is about the issue itself.

select (select 1 union select 2) as f1 from DUAL;
returns an error about multiple columns returned for the field f1. This is correct

This Query does not return rows. And it does not return "no rows" neither. It fails.

If I use this "invalid" query as a subquery, in a more complex statement, I would expect the whole statement to fail? (f1 still contains more than one row)

select exists ( select (select 1 union select 2) as f1 from DUAL ) from DUAL;

However, this statements hides the error, and returns a result.

The issue can be made more clear, if you use a teble in the middle query so you can add a having clause.

create table t select 1 as a;
 select exists ( select (select 1 union select 2) as f1 from t ) from DUAL;
+----------------------------------------------------------+
| exists ( select (select 1 union select 2) as f1 from t ) |
+----------------------------------------------------------+
|                                                        1 |
+----------------------------------------------------------+

# same query, with "having f1 = 1"
select exists ( select (select 1 union select 2) as f1 from t having f1 = 1) from DUAL;
ERROR 1242 (21000): Subquery returns more than 1 row
[11 Oct 2005 14:19] Martin Friebe
Just to add:
Multiple columsn (instead of multiply rows) inthe embedded subquery, correctly fails the outer query

 select exists ( select (select 1,2) as f1 from t ) from DUAL;
ERROR 1241 (21000): Operand should contain 1 column(s)

Note, that in both cases "exists" works on a query, that contains a further subquery.

and in both cases the query in the "exists" fails, if executed alone.
[11 Oct 2005 15:02] Valeriy Kravchuk
I noted your "wrong quote" from the manual, but I gave you a correct one to use in this case, so yours simply does not metter.

So, the simplest way to demonstrate the problem as you described it is:

mysql> create table t select 1 as a;
Query OK, 1 row affected (0,29 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from t;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0,01 sec)

mysql> select (select 1 union select 2) as f1 from t;
ERROR 1242 (21000): Subquery returns more than 1 row

That is a described and expected behaviour: select 1 union select 2 returns 2 rows, and is used in plase of column (scalar subquery), hence the error message. But:

mysql> select exists (select (select 1 union select 2) as f1 from t);
+--------------------------------------------------------+
| exists (select (select 1 union select 2) as f1 from t) |
+--------------------------------------------------------+
|                                                      1 |
+--------------------------------------------------------+
1 row in set (0,01 sec)

That is a bug. I do not think that any documentation notes about this behaviour will help. It's just a wrong "optimization"... 

This one is surely OK:

mysql> select exists(select 1 union select 2) as f1 from t;
+----+
| f1 |
+----+
|  1 |
+----+
1 row in set (0,01 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.0.15-rc |
+-----------+
1 row in set (0,00 sec)

Verified on 5.0.15-rc, ChangeSet@1.2028.1.1, 2005-10-10 21:42:14+02:00, lars@mysql.com, on Linux. 

Changed synopsis accordingly - it is not a lack of documentation, but a bug in optimizer, from my point of view.
[13 Oct 2005 22:09] Martin Friebe
just to add, the same thing happens with "DO"

do (select 1 uninon select 2) = 1;
gives no error.

DO is described as a "select that does not return th result to the client", the above fails if done in a select
[13 Oct 2005 22:27] Martin Friebe
Even more global, apparently anywhere, where the result is discarded

 select if(1, 2,(select 1 union select 2) ) ;
 select ifnull(1,(select 1 union select 2) ) ;
[27 Oct 2006 0:42] Igor Babaev
Valeriy,

I talked with Peter Gulutzan on this issue. Here's the transcript of our conversation on IRC.
<igor> hi peter!
<peter> igor: good morning!
<igor> I have a question for you:
<igor> there are two queries
<<igor> 1. select (exists (select(*) from t))from dual.
<igor> 2. select (exists(select (<expr>) from t)) from dual.
igor> should they always return the same result or not?
<igor> the problem is not with the case col_name IS NULL.
<igor> NULL or not NULL still exists.
<igor> the problem is with the case when calculation of the <expr> return a dynamic error.
<igor> here is a typical example:
<igor> select exists (select (select 1 union select 2) as f1 from t);
<igor> we return here:
<igor> mysql> select exists (select (select 1 union select 2) as f1 from t);
<igor> +--------------------------------------------------------+
<igor> | exists (select (select 1 union select 2) as f1 from t) |
<igor> +--------------------------------------------------------+
<igor> |                                                      1 |
<igor> +--------------------------------------------------------+
<<peter> since there is no reason to do the calculation, the failure of <expr> (and any side effects caused by <expr> such as changes to @variables) should not matter. the standard says only that what matters is whether at least one row is returned.
igor> What I can infer from what you said above this is a correct behaviour.
<igor> Am I right?
<peter> yes.
<igor> thank you very much.
<peter> no problem whatever. see you later. eof.

So I mark this case as 'Not a Bug'.