Bug #56690 Wrong results with subquery with GROUP BY inside < ANY clause
Submitted: 9 Sep 2010 13:30 Modified: 24 Mar 2011 22:51
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0, 5.1, 5.5 OS:Any
Assigned to: Ole John Aske CPU Architecture:Any

[9 Sep 2010 13:30] John Embretsen
Description:
When executing a query which uses a subquery that is using GROUP BY inside a "< ANY()" construct, no rows are returned when there should be. Example query:

SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2 GROUP BY number);

The above query returns no results, where as this query:

SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2);

returns appropriate rows.

Example client session:

mysql> SELECT * FROM t1;
+----+--------+
| pk | number |
+----+--------+
|  8 |      8 |
+----+--------+
1 row in set (0.00 sec)

mysql> SELECT * FROM t2;
+----+--------+
| pk | number |
+----+--------+
|  1 |      2 |
|  2 |      8 |
|  3 |   NULL |
|  4 |    166 |
+----+--------+
4 rows in set (0.00 sec)

mysql> # Subquery by itself:
mysql> SELECT number FROM t2 GROUP BY number;
+--------+
| number |
+--------+
|   NULL |
|      2 |
|      8 |
|    166 |
+--------+
4 rows in set (0.00 sec)

mysql> # Subquery used within ANY (the bug):
mysql> SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2 GROUP BY number);
Empty set (0.00 sec)

The issue goes away if any of the following is true:

 - t2 does not contain a NULL value in the "number" column.
 - the subquery does not contain GROUP BY.
 - the subquery includes a valid WHERE clause on the GROUP BY column
   (e.g "WHERE number < 9999").
 - "> ANY" is used instead of "< ANY".

How to repeat:
DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
 pk INT NOT NULL PRIMARY KEY,
 number INT,
 KEY key_number (number)
);

INSERT INTO t1 VALUES (8,8);

DROP TABLE IF EXISTS t2;

CREATE TABLE t2 (
 pk INT NOT NULL PRIMARY KEY,
 number INT,
 KEY key_number (number)
);

INSERT INTO t2 VALUES (1,2);
INSERT INTO t2 VALUES (2,8);
INSERT INTO t2 VALUES (3,NULL);
INSERT INTO t2 VALUES (4,166);

SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2 GROUP BY number);
SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2);
[9 Sep 2010 13:34] John Embretsen
EXPLAIN output indicates that the optimizer thinks it is an impossible WHERE clause:

mysql> EXPLAIN SELECT * FROM t1 WHERE t1.number < ANY(SELECT number FROM t2 GROUP BY number)\G;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Impossible WHERE noticed after reading const tables
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: t2
         type: index
possible_keys: NULL
          key: key_number
      key_len: 5
          ref: NULL
         rows: 4
        Extra: Using index
2 rows in set (0.00 sec)
[9 Sep 2010 14:00] John Embretsen
QA note:

Detected with the Random Query Generator's Transformer validator, specifically the ConvertSubqueriesToViews transformer and the optimizer_subquery.yy grammar.

It was detected because the issue goes away if a view is created as a replacement for the subquery and "SELECT * FROM view" is then used in the ANY clause instead of the subquery. The actual queries produced by the RQG hitting this issue were of course much more complex, but they too included "< ANY" with "GROUP BY".

---

Verified against:

5.0.86
5.1.51 (mysql-5.1-bugteam as of 2010-09-07)
5.5.7 (mysql-5.5 as of 2010-09-07)
5.6.x (mysql-next-mr-bugfixing as of 2010-09-07)

Impacts QA (results must be filtered out and coverage is reduced while using workarounds).
[9 Dec 2010 13:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126417

3477 Ole John Aske	2010-12-09
      Fix for bug#56690 Wrong results with subquery with GROUP BY inside < ANY clause
      
      A subquery of the form :
      
      1) SELECT ... WHERE <column> < ANY(select <column>...)
      
      Is transformed into the form by the optimizer:
      
      2) SELECT ... WHERE <column> < (select MAX(<column>)...)
      
      The Min/Max aggregation of subquery columns is implemented in 
      'class select_max_min_finder_subselect'. 
      
      The handling of NULL values in this implementation was incorrect as
      they was interpreted as a NULL-value being '>' than any other value being
      compared. This is incorrect wrt. SQL semantics which specifies that
      NULL values are 'undefined' and should be removed as soon as a non-NULL
      value is encountered.
      
      This fix changes implementation of all 
      select_max_min_finder_subselect::cmp_<type>() methods to follow the
      correct SQL semantics as described above. (Which also simplifies the
      logic IMHO)
      
      It also changes the methods to be 'private' within 
      class select_max_min_finder_subselect.
[17 Jan 2011 12:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/128960

3504 Ole John Aske	2011-01-17
      Updated fix for bug#56690 'Wrong results with subquery with GROUP BY inside < ANY clause'
            
      Addition testcases has been added according to suggestion from Evgeny P.
      
      A subquery of the form :
            
       1) SELECT ... WHERE <column> < ANY(select <column>...)
            
      Is transformed into the form by the optimizer:
            
       2) SELECT ... WHERE <column> < (select MAX(<column>)...)
            
      The Min/Max aggregation of subquery columns is implemented in 
      'class select_max_min_finder_subselect'. 
            
      The handling of NULL values in this implementation was incorrect as
      they was interpreted as a NULL-value being '>' than any other value being
      compared. This is incorrect wrt. SQL semantics which specifies that
      NULL values are 'undefined' and should be removed as soon as a non-NULL
      value is encountered.
            
      This fix changes implementation of all 
      select_max_min_finder_subselect::cmp_<type>() methods to follow the
      correct SQL semantics as described above. (Which also simplifies the
      logic IMHO)
            
      It also changes the methods to be 'private' within 
      class select_max_min_finder_subselect.
[2 Feb 2011 9:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130184

3588 Ole John Aske	2011-02-02
      Fix for bug#56690 'Wrong results with subquery with 
                         GROUP BY inside < ANY clause'
            
      A subquery of the form :
      
       1) SELECT ... WHERE <column> < ANY(select <column>...)
      
      Is transformed into the form by the optimizer:
      
        2) SELECT ... WHERE <column> < (select MAX(<column>)...)
      
      The Min/Max aggregation of subquery columns is implemented in 
      'class select_max_min_finder_subselect'. 
      
      The handling of NULL values in this implementation was incorrect as
      they was interpreted as a NULL-value being '>' than any other value being
      compared. This is incorrect wrt. SQL semantics which specifies that
      NULL values are 'undefined' and should be removed as soon as a non-NULL
      value is encountered.
                  
      This fix changes implementation of all 
      select_max_min_finder_subselect::cmp_<type>() methods to follow the
      correct SQL semantics as described above. (Which also simplifies the
      logic IMHO)
                  
      It also changes the methods to be 'private' within 
      class select_max_min_finder_subselect.
[2 Feb 2011 9:05] Bugs System
Pushed into mysql-trunk 5.6.2 (revid:ole.john.aske@oracle.com-20110202090455-99sq3a34nhkkq66n) (version source revid:ole.john.aske@oracle.com-20110202090455-99sq3a34nhkkq66n) (merge vers: 5.6.2) (pib:24)
[2 Feb 2011 9:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130186

4164 Ole John Aske	2011-02-02
      Fix for Bug#56690: Wrong results with subquery with GROUP BY inside < ANY clause.
      
      Cherry picked into mysql-5.1-telco-7.0
      
      Se original commit comments in http://lists.mysql.com/commits/130184
[2 Feb 2011 9:16] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.22 (revid:ole.john.aske@oracle.com-20110202091556-umif4kvy3wvcqosw) (version source revid:ole.john.aske@oracle.com-20110202091556-umif4kvy3wvcqosw) (merge vers: 5.1.51-ndb-7.0.22) (pib:24)
[2 Feb 2011 9:19] Ole John Aske
Fix has been pushed to mysql-trunk, and cherry picked for backporting into mysql-5.1-telco-7.0
[24 Mar 2011 23:00] Paul DuBois
Noted in 5.1.51-ndb-7.0.22, 5.6.2 changelogs.

For a query that used a subquery that included GROUP BY inside a <
ANY() construct, no rows were returned when there should have been.

CHANGESET - http://lists.mysql.com/commits/130186