Bug #42955 Wrong results returned by join queries with group by/order by when BKA is used
Submitted: 18 Feb 5:07 Modified: 23 Apr 3:29
Reporter: Igor Babaev
Status: Closed
Category:Server: Optimizer Severity:S2 (Serious)
Version:6.0-bzr OS:Any
Assigned to: Target Version:5.4+
Triage: Triaged: D2 (Serious)

[18 Feb 5:07] Igor Babaev
Description:
A two-way join query with group by/order by may return wrong results when rows of the
first table are accessed by an index compatible with the group by/order by list while the
second table is joined using the BKA algorithm.  

How to repeat:
Create and populate two tables with by following commands:

create table t1 (d int, id1 int, index idx1 (d, id1));
insert into t1 values
  (3, 20), (2, 40), (3, 10), (1, 10), (3, 20), (1, 40), (2, 30), (3, 30);

create table t2 (id1 int, id2 int, index idx2 (id1));
insert into t2 values 
  (20, 100), (30, 400), (20, 400), (30, 200), (10, 300), (10, 200), (40, 100),
  (40, 200), (30, 300), (10, 400), (20, 200), (20, 300);   

Force using join buffers whenever a table is joined through an index:
set join_cache_level=6;

The queries  
select t1.id1, sum(t2.id2) from t1 join t2 on t1.id1=t2.id1 
  where t1.d=3 group by t1.id1
and
select t1.id1, t2.id2 from t1 join t2 on t1.id1=t2.id1 
 where t1.d=3 order by t1.id1
whose execution plans employ join buffers return wrong results:

mysql> explain select t1.id1, sum(t2.id2) from t1 join t2 on t1.id1=t2.id1
    ->   where t1.d=3 group by t1.id1;
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows |
Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref  | idx1          | idx1 | 5       | const       |    4 |
Using where; Using index |
|  1 | SIMPLE      | t2    | ref  | idx2          | idx2 | 5       | test.t1.id1 |    2 |
Using join buffer        |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
2 rows in set (0.01 sec)

mysql> select t1.id1, sum(t2.id2) from t1 join t2 on t1.id1=t2.id1    where t1.d=3 group
by t1.id1;
+------+-------------+
| id1  | sum(t2.id2) |
+------+-------------+
|   20 |         200 |
|   30 |         400 |
|   20 |         800 |
|   30 |         200 |
|   10 |         500 |
|   30 |         300 |
|   10 |         400 |
|   20 |        1000 |
+------+-------------+
8 rows in set (0.01 sec)

mysql> explain select t1.id1, t2.id2 from t1 join t2 on t1.id1=t2.id1
    ->  where t1.d=3 order by t1.id1;
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref         | rows |
Extra                    |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | t1    | ref  | idx1          | idx1 | 5       | const       |    4 |
Using where; Using index |
|  1 | SIMPLE      | t2    | ref  | idx2          | idx2 | 5       | test.t1.id1 |    2 |
Using join buffer        |
+----+-------------+-------+------+---------------+------+---------+-------------+------+--------------------------+
2 rows in set (0.01 sec)

mysql> select t1.id1, t2.id2 from t1 join t2 on t1.id1=t2.id1   where t1.d=3 order by
t1.id1;
+------+------+
| id1  | id2  |
+------+------+
|   20 |  100 |
|   20 |  100 |
|   30 |  400 |
|   20 |  400 |
|   20 |  400 |
|   30 |  200 |
|   10 |  300 |
|   10 |  200 |
|   30 |  300 |
|   10 |  400 |
|   20 |  200 |
|   20 |  200 |
|   20 |  300 |
|   20 |  300 |
+------+------+
14 rows in set (0.01 sec)
[18 Feb 5:13] Igor Babaev
This bug originally was reported to me by Gene Pang from Google.
Gene used a different database schema with 5 InnoDB tables. He demonstrated the problem
with a slightly modified code to force using the BKA algorithm when optimizer did not use
it at all.
[18 Feb 7:44] Sveta Smirnova
Thank you for the report.

Verified as described.
[5 Mar 6:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68302

2727 Igor Babaev	2009-03-04
      Fixed bug #42955.
      If a join buffer is employed to join a table through BNL or
      BKA algorithm for a query with a ORDER BY / GROUP BY clause
      then the result set has always to be sorted unless the clause
      can be optimized away.
[10 Mar 2:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68726

2727 Igor Babaev	2009-03-09
      Fixed bug #42955.
      If a join buffer is employed to join a table through BNL or
      BKA algorithm for a query with a ORDER BY / GROUP BY clause
      then the result set has always to be sorted unless the clause
      can be optimized away.
[20 Apr 17:47] Bugs System
Pushed into 6.0.11-alpha (revid:sergefp@mysql.com-20090417211236-fy28y9o1w8p4ic9m)
(version source revid:igor@mysql.com-20090310013411-xxe9x3d9kqxuo6ar) (merge vers:
6.0.11-alpha) (pib:6)
[23 Apr 3:29] Paul DuBois
Noted in 6.0.11 changelog.

A two-way join query with a GROUP BY or ORDER BY clause could produce
incorrect results when rows of the first table are accessed by an
index compatible with the GROUP BY or ORDER BY list while the second
table is joined using the Batched Key Access algorithm.
[15 Jul 17:43] Philip Stoev
This fix appears to have caused this regression - bug #45267