Bug #71097 Wrong results for a simple query with GROUP BY
Submitted: 5 Dec 2013 18:26 Modified: 6 Jun 2014 2:12
Reporter: Valeriy Kravchuk Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.1.70, 5.5.33, 5.5.35, 5.6.14, 5.6.15 OS:Any
Assigned to: CPU Architecture:Any
Tags: GROUP BY

[5 Dec 2013 18:26] Valeriy Kravchuk
Description:
It seems optimizer does something wrong when "Using index for group-by":

openxs@ao756:~/dbs/5.5$ bin/mysql --no-defaults -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.5.35-debug MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE ti (
    -> `id` int(10) unsigned NOT NULL,
    -> `ns_id` int(10) unsigned NOT NULL DEFAULT '0',
    -> PRIMARY KEY (`ns_id`,`id`)) engine=InnoDB;
insert into ti values(1,1), (2,2), (1042000,501087), (5435626,504005);
select ns_id, id from ti where (ns_id = 501087 and id = 1042000)
  or (ns_id = 504005 and id = 5435626) group by ns_id;
select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000)
  or (ns_id = 504005 and id = 5435626) group by ns_id;
select ns_id, max(id) from ti group by ns_id;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into ti values(1,1), (2,2), (1042000,501087), (5435626,504005);
Query OK, 4 rows affected (0.04 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select ns_id, id from ti where (ns_id = 501087 and id = 1042000)
    ->   or (ns_id = 504005 and id = 5435626) group by ns_id;
+--------+---------+
| ns_id  | id      |
+--------+---------+
| 501087 | 1042000 |
| 504005 | 5435626 |
+--------+---------+
2 rows in set (0.00 sec)

mysql> select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000)
    ->   or (ns_id = 504005 and id = 5435626) group by ns_id;
+--------+---------+
| ns_id  | max(id) |
+--------+---------+
| 501087 | 1042000 |
+--------+---------+
1 row in set (0.01 sec)

mysql> select ns_id, max(id) from ti group by ns_id;
+--------+---------+
| ns_id  | max(id) |
+--------+---------+
|      1 |       1 |
|      2 |       2 |
| 501087 | 1042000 |
| 504005 | 5435626 |
+--------+---------+
4 rows in set (0.00 sec)

mysql> select * from ti where (ns_id = 501087 and id = 1042000)    or (ns_id = 504005 and id = 5435626);
+---------+--------+
| id      | ns_id  |
+---------+--------+
| 1042000 | 501087 |
| 5435626 | 504005 |
+---------+--------+
2 rows in set (0.00 sec)

mysql> explain select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000)    or (ns_id = 504005 and id = 5435626) group by ns_id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | ti    | range | PRIMARY       | PRIMARY | 8       | NULL |    2 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

Same with older 5.5.x (down to 5.5.31) and 5.6.14 also.

How to repeat:
CREATE TABLE ti (
`id` int(10) unsigned NOT NULL,
`ns_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ns_id`,`id`)) engine=InnoDB;
insert into ti values(1,1), (2,2), (1042000,501087), (5435626,504005);
select ns_id, id from ti where (ns_id = 501087 and id = 1042000) 
  or (ns_id = 504005 and id = 5435626) group by ns_id;
select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000) 
  or (ns_id = 504005 and id = 5435626) group by ns_id;
select ns_id, max(id) from ti group by ns_id;

Suggested fix:
Add some more basic tests to the MTR regression test suite maybe, to catch this earlier?
[5 Dec 2013 18:38] Valeriy Kravchuk
From what I see, with 5.1.70 and MyISAM the result is the same - wrong:

openxs@ao756:~/dbs/5.1$ bin/mysql --no-defaults -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.70 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE ti ( `id` int(10) unsigned NOT NULL, `ns_id` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`ns_id`,`id`)) engine=InnoDB;
Query OK, 0 rows affected, 2 warnings (0.08 sec)

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1286
Message: Unknown table engine 'InnoDB'
*************************** 2. row ***************************
  Level: Warning
   Code: 1266
Message: Using storage engine MyISAM for table 'ti'
2 rows in set (0.00 sec)

mysql> insert into ti values(1,1), (2,2), (1042000,501087), (5435626,504005);   Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from ti where (ns_id = 501087 and id = 1042000)    or (ns_id = 504005 and id = 5435626);
+---------+--------+
| id      | ns_id  |
+---------+--------+
| 1042000 | 501087 |
| 5435626 | 504005 |
+---------+--------+
2 rows in set (0.00 sec)

mysql> select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000)    or (ns_id = 504005 and id = 5435626) group by ns_id;
+--------+---------+
| ns_id  | max(id) |
+--------+---------+
| 501087 | 1042000 |
+--------+---------+
1 row in set (0.00 sec)

mysql> explain select ns_id, max(id) from ti where (ns_id = 501087 and id = 1042000)    or (ns_id = 504005 and id = 5435626) group by ns_id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra                                 |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
|  1 | SIMPLE      | ti    | range | PRIMARY       | PRIMARY | 8       | NULL |    2 | Using where; Using index for group-by |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

So, does not looks like a recent regression at least.
[6 Dec 2013 10:57] MySQL Verification Team
Hello Valeriy,

Thank you for the bug report and test case.
Verified as described.

Thanks,
Umesh
[6 Dec 2013 11:24] Valeriy Kravchuk
Corrected typo in Synopsis.
[6 Jun 2014 2:12] Paul DuBois
Noted in 5.5.39, 5.6.20, 5.7.5 changelogs.

If there was a predicate on a column referenced by MIN() or MAX() and
that predicate was not present in all the disjunctions on key parts
earlier in the compound index, Loose Index Scan returned an incorrect
result.
[1 Aug 2014 15:55] Laurynas Biveinis
5.5 $ bzr log -r 4635
------------------------------------------------------------
revno: 4635
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.5
timestamp: Wed 2014-05-07 14:59:23 +0530
message:
  Bug#17909656  - WRONG RESULTS FOR A SIMPLE QUERY WITH GROUP BY
  
  Problem:
  If there is a predicate on a column referenced by MIN/MAX and
  that predicate is not present in all the disjunctions on
  keyparts earlier in the compound index, Loose Index Scan will
  not return correct result.
  
  Analysis:
  When loose index scan is chosen, range optimizer currently
  groups all the predicates that contain group parts separately
  and minmax parts separately. It therefore applies all the
  conditions on the group parts first to the fetched row.
  Then in the call to next_max, it processes the conditions
  which have min/max keypart.
  
  For ex in the following query:
  Select f1, max(f2) from t1 where (f1 = 10 and f2 = 13) or
  (f1 = 3) group by f1;
  Condition (f2 = 13) would be applied even for rows that
  satisfy (f1 = 3) thereby giving wrong results.
  
  Solution:
  Do not choose loose_index_scan for such cases. So a new rule
  WA2 is introduced to take care of the same.
  
  WA2: "If there are predicates on C, these predicates must
  be in conjuction to all predicates on all earlier keyparts
  in I."
  
  Todo the same, fix reuses the function get_constant_key_infix().
  Since this funciton will fail for all multi-range conditions, it
  is re-written to recognize that if the sub-conditions are
  equivalent across the disjuncts: it will now succeed.
  And to achieve this a new helper function is introduced called
  all_same().
  
  The fix also moves the test of NGA3 up to the former only
  caller, get_constant_key_infix().
[1 Aug 2014 15:55] Laurynas Biveinis
5.5 laurynas$ bzr log -r 4636
------------------------------------------------------------
revno: 4636
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.5
timestamp: Wed 2014-05-07 16:55:03 +0530
message:
  Fixing compilation error. Post push fix for Bug#17909656