Bug #27532 Incorrect results when grouping with similar IN and NOT IN conditions
Submitted: 29 Mar 2007 18:19 Modified: 11 Apr 2007 2:18
Reporter: David Walker Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.0.37, 4.1, 5.1, 5.2 OS:Linux (Linux)
Assigned to: Igor Babaev CPU Architecture:Any

[29 Mar 2007 18:19] David Walker
Description:
When using a GROUP BY that contains a NOT IN condition, in certain circumstances the expression is incorrectly optimized away.

How to repeat:
create table if not exists bug_test (col int primary key not null);

insert into bug_test values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

*EXPECTED BEHAVIOR*
select if (col in (1,2), col, '') as x1, if (col not in (1,2), col, '') as x2 from bug_test group by 1, 2;
+----+----+
| x1 | x2 |
+----+----+
|    | 10 |
|    | 3  |
|    | 4  |
|    | 5  |
|    | 6  |
|    | 7  |
|    | 8  |
|    | 9  |
| 1  |    |
| 2  |    |
+----+----+
10 rows in set (0.09 sec)

*INCORRECT RESULT*
select if (col in (1,2), col, '') as x1, if (col not in (1,2), col, '') as x2 from bug_test group by 1, if (col not in (1,2), col, '');
+----+----+
| x1 | x2 |
+----+----+
|    | 3  |
| 1  |    |
| 2  |    |
+----+----+
3 rows in set (0.10 sec)
[30 Mar 2007 9:13] Sveta Smirnova
Thank you for the report.

Verified as described.
[3 Apr 2007 19:26] Igor Babaev
This problem can be demostrated with ORDER BY queries as well:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 4.1.23-debug |
+--------------+
1 row in set (0.00 sec)

mysql> CREATE TABLE t1 (a int);
Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO t1 VALUES (3), (2), (4), (1);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT a, IF(a IN (2,3), a, a+10) FROM t1
    ->   ORDER BY IF(a IN (2,3), a, a+10);
+------+-------------------------+
| a    | IF(a IN (2,3), a, a+10) |
+------+-------------------------+
|    2 |                       2 |
|    3 |                       3 |
|    1 |                      11 |
|    4 |                      14 |
+------+-------------------------+
4 rows in set (0.00 sec)

mysql> SELECT a, IF(a NOT IN (2,3), a, a+10) FROM t1
    ->   ORDER BY IF(a NOT IN (2,3), a, a+10) ;
+------+-----------------------------+
| a    | IF(a NOT IN (2,3), a, a+10) |
+------+-----------------------------+
|    1 |                           1 |
|    4 |                           4 |
|    2 |                          12 |
|    3 |                          13 |
+------+-----------------------------+
4 rows in set (0.00 sec)

mysql> SELECT a, IF(a IN (2,3), a, a+10)FROM t1
    ->   ORDER BY IF(a NOT IN (2,3), a, a+10) ;
+------+-------------------------+
| a    | IF(a IN (2,3), a, a+10) |
+------+-------------------------+
|    2 |                       2 |
|    3 |                       3 |
|    1 |                      11 |
|    4 |                      14 |
+------+-------------------------+
4 rows in set (0.00 sec)

Here the expected order of rows is the same as in the previous query.

The problem can be observed when the sorting expression with the IN predicate
in ORDER BY differs from an expression in SELECT only by an additional NOT
before IN.
In the following query we have more differences and the result is correct:

mysql> SELECT a, IF(a IN (2,3), a, a+100) FROM t1
    ->   ORDER BY IF(a NOT IN (2,3), a, a+10) ;
+------+--------------------------+
| a    | IF(a IN (2,3), a, a+100) |
+------+--------------------------+
|    1 |                      101 |
|    4 |                      104 |
|    2 |                        2 |
|    3 |                        3 |
+------+--------------------------+
4 rows in set (0.00 sec)

A similar problem exists for BETWEEN:

mysql> SELECT a, IF(a BETWEEN 2 AND 3, a, a+10) FROM t1
    ->   ORDER BY IF(a BETWEEN 2 AND 3, a, a+10);
+------+--------------------------------+
| a    | IF(a BETWEEN 2 AND 3, a, a+10) |
+------+--------------------------------+
|    2 |                              2 |
|    3 |                              3 |
|    1 |                             11 |
|    4 |                             14 |
+------+--------------------------------+
4 rows in set (0.00 sec)

mysql> SELECT a, IF(a NOT BETWEEN 2 AND 3, a, a+10) FROM t1
    ->   ORDER BY IF(a NOT BETWEEN 2 AND 3, a, a+10) ;
+------+------------------------------------+
| a    | IF(a NOT BETWEEN 2 AND 3, a, a+10) |
+------+------------------------------------+
|    1 |                                  1 |
|    4 |                                  4 |
|    2 |                                 12 |
|    3 |                                 13 |
+------+------------------------------------+
4 rows in set (0.00 sec)

mysql> SELECT a, IF(a BETWEEN 2 AND 3, a, a+10) FROM t1
    ->   ORDER BY IF(a NOT BETWEEN 2 AND 3, a, a+10);
+------+--------------------------------+
| a    | IF(a BETWEEN 2 AND 3, a, a+10) |
+------+--------------------------------+
|    2 |                              2 |
|    3 |                              3 |
|    1 |                             11 |
|    4 |                             14 |
+------+--------------------------------+
[3 Apr 2007 20:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23714

ChangeSet@1.2632, 2007-04-03 13:00:29-07:00, igor@olga.mysql.com +4 -0
  Fixed bug #27532: wrong results with ORDER/GROUP BY queries containing
  IN/BETWEEN predicates in sorting expressions.
  Wrong results may occur when the select list contains an expression
  with IN/BETWEEN predicate that differs from a sorting expression by
  an additional NOT only.
   
  Added the method Item_func_opt_neg::eq to compare correctly expressions
  containing [NOT] IN/BETWEEN.
  The eq method inherited from the Item_func returns TRUE when comparing
  'a IN (1,2)' with 'a NOT IN (1,2)' that is not, of course, correct.
[3 Apr 2007 21:02] Sergey Petrunya
Approved with comments on  irc
[3 Apr 2007 21:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/23717

ChangeSet@1.2632, 2007-04-03 14:32:16-07:00, igor@olga.mysql.com +4 -0
  Fixed bug #27532: wrong results with ORDER/GROUP BY queries containing
  IN/BETWEEN predicates in sorting expressions.
  Wrong results may occur when the select list contains an expression
  with IN/BETWEEN predicate that differs from a sorting expression by
  an additional NOT only.
   
  Added the method Item_func_opt_neg::eq to compare correctly expressions
  containing [NOT] IN/BETWEEN.
  The eq method inherited from the Item_func returns TRUE when comparing
  'a IN (1,2)' with 'a NOT IN (1,2)' that is not, of course, correct.
[9 Apr 2007 12:41] Bugs System
Pushed into 5.1.18-beta
[9 Apr 2007 12:43] Bugs System
Pushed into 5.0.40
[9 Apr 2007 12:45] Bugs System
Pushed into 4.1.23
[11 Apr 2007 2:13] Paul DuBois
Noted in 4.1.23, 5.0.40, 5.1.18 changelogs.

Incorrect results could be returned for some queries that contained a
select list expression with IN or BETWEEN together with an ORDER BY
or GROUP BY on the same expression using NOT IN or NOT BETWEEN.