MySQL Bugs: #39069: <row constructor> IN <table-subquery> seriously messed up

Bug #39069	<row constructor> IN <table-subquery> seriously messed up
Submitted:	27 Aug 2008 11:32	Modified:	11 Feb 2009 3:07
Reporter:	Roland Bouman	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S1 (Critical)
Version:	5.1.26-rc; 5.0.66a; 6.0.6	OS:	Linux (Ubuntu)
Assigned to:	Gleb Shchepa	CPU Architecture:	Any
Tags:	regression

Description:
Almost all expressions of the form

<row-constructor> IN (<query-expression>) return incorrect results. Regardless of the operands the result is always true.

The only exception is where the <row-constructor> has a NULL value in at least one column, in which case the result is NULL (as expected)

This is a serious bug, because the results are clearly wrong.

How to repeat:
SELECT (NULL, NULL) IN (SELECT NULL, NULL)

(Correct: NULL)

mysql> SELECT (1,2) IN (SELECT NULL, NULL);
+------------------------------+
| (1,2) IN (SELECT NULL, NULL) |
+------------------------------+
|                            1 |
+------------------------------+

(WRONG! Should have been NULL....)

mysql> SELECT (1,2) IN (SELECT 1, NULL);
+---------------------------+
| (1,2) IN (SELECT 1, NULL) |
+---------------------------+
|                         1 |
+---------------------------+
1 row in set (0.00 sec)

(WRONG! Should have been NULL....)

mysql> SELECT (1,2) IN (SELECT NULL, 2);
+---------------------------+
| (1,2) IN (SELECT NULL, 2) |
+---------------------------+
|                         1 |
+---------------------------+

(WRONG! Should have been NULL....)

mysql> SELECT (1,2) IN (SELECT NULL, 1);
+---------------------------+
| (1,2) IN (SELECT NULL, 1) |
+---------------------------+
|                         1 |
+---------------------------+

(WRONG! Should have been 0. Rationale is that 2 != 1 so we can be sure (1,2) is not in (null, 1) despite the occurrence of NULL)

mysql> SELECT (1,2) IN (SELECT 1, 1);
+------------------------+
| (1,2) IN (SELECT 1, 1) |
+------------------------+
|                      1 |
+------------------------+
(WRONG! Should have been 0.)

Suggested fix:
Please return correct results

Thank you for a bug report. Verified just as described:

mysql> select (1,2) in (select 2,1);
+-----------------------+
| (1,2) in (select 2,1) |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.38 sec)

mysql> select (1,2) in (select 3,1);
+-----------------------+
| (1,2) in (select 3,1) |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)

mysql> select (1,2) in (select null, null);
+------------------------------+
| (1,2) in (select null, null) |
+------------------------------+
|                            1 |
+------------------------------+
1 row in set (0.00 sec)

For single value server works as expected:

mysql> select (1) in (select null);
+----------------------+
| (1) in (select null) |
+----------------------+
|                 NULL |
+----------------------+
1 row in set (0.00 sec)

mysql> select (1) in (select 1);
+-------------------+
| (1) in (select 1) |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select (1) in (select 2);
+-------------------+
| (1) in (select 2) |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

mysql> select version();
+-----------------------+
| version()             |
+-----------------------+
| 6.0.6-alpha-community |
+-----------------------+
1 row in set (0.02 sec)

5.1 and 5.0 are also affected:

C:\Program Files\MySQL\MySQL Server 6.0\bin>mysql -uroot -proot -P3310 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.26-rc-community-log MySQL Community Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select (2,2) in (select 2,1);
+-----------------------+
| (2,2) in (select 2,1) |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.05 sec)

mysql> select (2,2) in (select null,null);
+-----------------------------+
| (2,2) in (select null,null) |
+-----------------------------+
|                           1 |
+-----------------------------+
1 row in set (0.00 sec)

mysql> exit
Bye

C:\Program Files\MySQL\MySQL Server 6.0\bin>mysql -uroot -proot -P3308 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.0.66a-enterprise-gpl-nt MySQL Enterprise Server (GPL)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select (2,2) in (select 2,1);
+-----------------------+
| (2,2) in (select 2,1) |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.08 sec)

mysql> select (2,2) in (select null,null);
+-----------------------------+
| (2,2) in (select null,null) |
+-----------------------------+
|                           1 |
+-----------------------------+
1 row in set (0.00 sec)

mysql> exit
Bye

4.1.22 worked more correctly:

C:\Program Files\MySQL\MySQL Server 6.0\bin>mysql -uroot -proot -P3306 test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 4.1.22-community-nt-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> select (2,2) in (select null,null);
+-----------------------------+
| (2,2) in (select null,null) |
+-----------------------------+
|                           1 |
+-----------------------------+
1 row in set (0.00 sec)

Shoukd be NULL actually, but this is correct:

mysql> select (2,2) in (select 2,1);
+-----------------------+
| (2,2) in (select 2,1) |
+-----------------------+
|                     0 |
+-----------------------+
1 row in set (0.00 sec)

mysql> select (2,2) in (select 2,2);
+-----------------------+
| (2,2) in (select 2,2) |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)

So, this is a regression of a kind in versions >= 5.0.

The problem seems to occur only in case the right hand operand includes an expression of the form 

SELECT <literal>, <literal>

Test data:

mysql> SELECT Id, Name From world.City WHERE ID IN (1,2);
+----+----------+
| Id | Name     |
+----+----------+
|  1 | Kabul    |
|  2 | Qandahar |
+----+----------+

Now, for example:

SELECT (SELECT ID, Name FROM world.City WHERE ID = 1) 
    IN (SELECT ID, Name FROM world.City WHERE ID = 2) 

(Correct: returns 0)

SELECT (SELECT 1, 'Kabul')
    IN (SELECT ID, Name FROM world.City WHERE ID = 2);

(Correct: returns 0)

SELECT (SELECT 1, 'Kabul')
    IN (SELECT 2, 'Quandahar');

(WRONG! returns 1, should have been 0)

Interesting tidbit: 

SELECT   (SELECT 1, 'Kabul')      
      IN (SELECT ID, Name FROM world.City WHERE ID = 2 
UNION ALL SELECT 2, 'Qandahar');

(WRONG! returns 1, should have been 0)

(same wrong result when the UNION ALL operands are reversed)

This bug and bug #35311 are duplicates.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/64206

2725 Gleb Shchepa	2009-01-28
      Bug #39069: <row constructor> IN <table-subquery> seriously messed up
      
      "ROW(...) IN (SELECT ... FROM DUAL)" always returned TRUE.
      
      Item_in_subselect::row_value_transformer rewrites "ROW(...) IN SELECT"
      conditions into the "EXISTS (SELECT ... HAVING ...)" form.
      For a subquery from the DUAL pseudotable resulting HAVING condition
      is an expression on constant values, so further transformation with
      optimize_cond() eliminates this HAVING condition and resets
      JOIN::having to NULL.
      Then JOIN::exec treated that NULL as an always-true-HAVING and that
      caused a bug.
      
      To distinguish an optimized out "HAVING TRUE" clause from 
      "HAVING FALSE" we already have the JOIN::having_value flag. 
      However, JOIN::exec() ignored JOIN::having_value as described abouve 
      as if it always set to COND_TRUE.
      
      The JOIN::exec method has been modified to take into account the value
      of the JOIN::having_value field.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/64256

2725 Gleb Shchepa	2009-01-28
      Bug #39069: <row constructor> IN <table-subquery> seriously 
                  messed up
      
      "ROW(...) IN (SELECT ... FROM DUAL)" always returned TRUE.
      
      Item_in_subselect::row_value_transformer rewrites "ROW(...) 
      IN SELECT" conditions into the "EXISTS (SELECT ... HAVING ...)" 
      form.
      For a subquery from the DUAL pseudotable resulting HAVING 
      condition is an expression on constant values, so further 
      transformation with optimize_cond() eliminates this HAVING 
      condition and resets JOIN::having to NULL.
      Then JOIN::exec treated that NULL as an always-true-HAVING 
      and that caused a bug.
      
      To distinguish an optimized out "HAVING TRUE" clause from
      "HAVING FALSE" we already have the JOIN::having_value flag.
      However, JOIN::exec() ignored JOIN::having_value as described 
      above as if it always set to COND_TRUE.
      
      The JOIN::exec method has been modified to take into account 
      the value of the JOIN::having_value field.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/64383

2732 Gleb Shchepa	2009-01-28
      Bug #39069: <row constructor> IN <table-subquery> seriously
                  messed up
      
      "ROW(...) IN (SELECT ... FROM DUAL)" always returned TRUE.
      
      Item_in_subselect::row_value_transformer rewrites "ROW(...)
      IN SELECT" conditions into the "EXISTS (SELECT ... HAVING ...)"
      form.
      For a subquery from the DUAL pseudotable resulting HAVING
      condition is an expression on constant values, so further
      transformation with optimize_cond() eliminates this HAVING
      condition and resets JOIN::having to NULL.
      Then JOIN::exec treated that NULL as an always-true-HAVING
      and that caused a bug.
      
      To distinguish an optimized out "HAVING TRUE" clause from
      "HAVING FALSE" we already have the JOIN::having_value flag.
      However, JOIN::exec() ignored JOIN::having_value as described
      above as if it always set to COND_TRUE.
      
      The JOIN::exec method has been modified to take into account
      the value of the JOIN::having_value field.

Pushed into 5.0.78 (revid:joro@sun.com-20090203090422-v91rh3gmx2ulhbu9) (version source revid:gshchepa@mysql.com-20090128184645-ghabfh1x0fz04beg) (merge vers: 5.0.77) (pib:6)

Pushed into 5.1.32 (revid:joro@sun.com-20090203090549-gos3v4320vimrzg6) (version source revid:sergefp@mysql.com-20090128192114-jm5r3gezr59oki5x) (merge vers: 5.1.32) (pib:6)

Pushed into 6.0.10-alpha (revid:kostja@sun.com-20090204104420-mw1i2u9lum4bxjo6) (version source revid:davi.arnaut@sun.com-20090129184519-mgp2vr1rna9nk8li) (merge vers: 6.0.10-alpha) (pib:6)

Noted in 5.0.78, 5.1.32, 6.0.10 changelogs.

The expression ROW(...) IN (SELECT ... FROM DUAL) always returned
TRUE.

Pushed into 5.1.32-ndb-6.3.23 (revid:tomas.ulin@sun.com-20090217131017-6u8qz1edkjfiobef) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)

Pushed into 5.1.32-ndb-6.4.3 (revid:tomas.ulin@sun.com-20090217134419-5ha6xg4dpedrbmau) (version source revid:tomas.ulin@sun.com-20090203133556-9rclp06ol19bmzs4) (merge vers: 5.1.32-ndb-6.3.22) (pib:6)

Pushed into 5.1.32-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090217134216-5699eq74ws4oxa0j) (version source revid:tomas.ulin@sun.com-20090202111723-1zzwax187rtls913) (merge vers: 5.1.32-ndb-6.2.17) (pib:6)

This patch for this bug contains wrong result:
Instead of the following result:

+# 2nd and 3rd columns should be same for t1.a == 11 only
+SELECT a AS x, ROW(11, 12) = (SELECT MAX(x), 12), ROW(11, 12) IN (SELECT MAX(x), 12) FROM t1;
+x	ROW(11, 12) = (SELECT MAX(x), 12)	ROW(11, 12) IN (SELECT MAX(x), 12)
+1	0	1
+2	0	1
+11	1	1

The result should be:
1  0 0
2  0 0
11 1 1

Therefore the patch is either wrong or incomplete.

Timour,

Thank you for report.

Yes, the patch doesn't cover aggregate functions (and you cited "regression tests" part of the test suite for aggregates).

Created a new bug #56711 entry for this issue.

Timour,

The problem you pointed is not related to this bug.
The root of the problem is:

mysql> SELECT a AS x, (SELECT MAX(x)) FROM t1;
+------+-----------------+
| x    | (SELECT MAX(x)) |
+------+-----------------+
|    1 |               1 |
|    2 |               2 |
|   11 |              11 |
+------+-----------------+
3 rows in set (0.00 sec)

mysql> SELECT a, (SELECT MAX(a)) FROM t1;
+------+-----------------+
| a    | (SELECT MAX(a)) |
+------+-----------------+
|    1 |              11 |
+------+-----------------+
1 row in set (0.00 sec)

Obviously, only second result is correct.

Well, I mainly pointed out that the previous patch recorded an incorrect result.
It could be a different bug, or an incomplete fix, hard to say.

I don't agree with your conclusion, because
SELECT a AS x, (SELECT MAX(x)) FROM t1;
is like
SELECT a AS x, (SELECT MAX(x) from DUAL) FROM t1;
so 'x' should be bound to each result row of the outer query,
in this case each row of 't1'.

I suggest that you ask Peter Gulutzan about both queries, and the
expected result.