Bug #70657 SELECT DISTINCT...GROUP BY returns wrong results in some cases
Submitted: 18 Oct 2013 16:55 Modified: 7 Feb 2014 17:50
Reporter: Tim McLaughlin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:5.6.14 OS:Linux
Assigned to: CPU Architecture:Any
Tags: GROUP BY, select distinct

[18 Oct 2013 16:55] Tim McLaughlin
Description:
Wrong results may be returned in MySQL 5.6.14 if you include a constant in the SELECT clause of a SELECT DISTINCT...GROUP BY query.

How to repeat:
This is a very contrived example but demonstrates the gist of the bug.

DROP TABLE IF EXISTS `foo`;
CREATE TABLE `foo` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `foo` 
VALUES (1,1,1),(1,2,1),(1,3,0),(1,4,0),(2,1,1),(2,2,1),(2,3,0),(2,4,0);

Query #1:
SELECT * FROM foo ORDER BY a, c, b;

Results:
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    3 |    0 |
|    1 |    4 |    0 |
|    1 |    1 |    1 |
|    1 |    2 |    1 |
|    2 |    3 |    0 |
|    2 |    4 |    0 |
|    2 |    1 |    1 |
|    2 |    2 |    1 |
+------+------+------+

Query #2:
SELECT DISTINCT a, b
FROM (SELECT * FROM foo ORDER BY a, b) AS f 
GROUP BY a,c;

MySQL 5.6.14 returns the expected results:
+------+------+
| a    | b    |
+------+------+
|    1 |    3 |
|    1 |    1 |
|    2 |    3 |
|    2 |    1 |
+------+------+

Query #3 (this is the same as query #2 except we add a constant to the SELECT clause):
SELECT DISTINCT a, b, 0 AS d
FROM (SELECT * FROM foo ORDER BY a, b) AS f 
GROUP BY a,c;

Results from MySQL 5.6.14:
+------+------+---+
| a    | b    | d |
+------+------+---+
|    1 |    3 | 0 |
|    1 |    1 | 0 |
+------+------+---+

Expected results:
+------+------+---+
| a    | b    | d |
+------+------+---+
|    1 |    3 | 0 |
|    1 |    1 | 0 |
|    2 |    3 | 0 |
|    2 |    1 | 0 |
+------+------+---+

Note: when I run query #3 in MySQL 5.5.30 I get the expected results.

Suggested fix:
Return the correct results.
[18 Oct 2013 19:59] Godofredo Miguel Solorzano
Thank you for the bug report.

[16:52:32][pochita:]~ miguel$ mysql -uroot test
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 5.6.14 MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> set @@sql_mode :='ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE `foo` (
    ->   `a` int(11) DEFAULT NULL,
    ->   `b` int(11) DEFAULT NULL,
    ->   `c` int(11) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.09 sec)

mysql>
mysql> INSERT INTO `foo`
    -> VALUES (1,1,1),(1,2,1),(1,3,0),(1,4,0),(2,1,1),(2,2,1),(2,3,0),(2,4,0);
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT DISTINCT a, b, 0 AS d
    -> FROM (SELECT * FROM foo ORDER BY a, b) AS f
    -> GROUP BY a,c;
ERROR 1055 (42000): 'f.b' isn't in GROUP BY

https://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html

"MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. "
[21 Oct 2013 12:12] Tim McLaughlin
> https://dev.mysql.com/doc/refman/5.6/en/group-by-extensions.html
> "However, this is useful primarily when all values in each nonaggregated column not named 
> in the GROUP BY are the same for each group.The server is free to choose any value from 
> each group, so unless they are the same, the values chosen are indeterminate. "

Below is an example where the values are the same for the non-aggregated columns in the GROUP BY.  If you leave the MySQL GROUP BY extension enabled (i.e. don't set the sql_mode to ONLY_FULL_GROUP_BY) it will return incorrect results if you add a constant to the SELECT clause after the column references.

$ mysql test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 264
Server version: 5.6.14-log MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE `foo` (
    ->   `a` int(11) DEFAULT NULL,
    ->   `b` int(11) DEFAULT NULL,
    ->   `c` int(11) DEFAULT NULL
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.31 sec)

mysql> INSERT INTO `foo` 
    -> VALUES (1,5,1),(1,5,1),(1,5,0),(1,5,0),(2,5,1),(2,5,1),(2,5,0),(2,5,0);
Query OK, 8 rows affected (0.07 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM foo;
+------+------+------+
| a    | b    | c    |
+------+------+------+
|    1 |    5 |    1 |
|    1 |    5 |    1 |
|    1 |    5 |    0 |
|    1 |    5 |    0 |
|    2 |    5 |    1 |
|    2 |    5 |    1 |
|    2 |    5 |    0 |
|    2 |    5 |    0 |
+------+------+------+
8 rows in set (0.00 sec)

mysql> # This query returns the correct results:
mysql> SELECT DISTINCT a, b
    -> FROM (SELECT * FROM foo ORDER BY a, b) AS f 
    -> GROUP BY a,c;
+------+------+
| a    | b    |
+------+------+
|    1 |    5 |
|    2 |    5 |
+------+------+
2 rows in set (0.00 sec)

mysql> # This query also returns the correct results:
mysql> # Note the non-aggregated columns (b & d) contain the same value for every record
mysql> SELECT DISTINCT 0 as d, a, b
    -> FROM (SELECT * FROM foo ORDER BY a, b) AS f 
    -> GROUP BY a,c;
+---+------+------+
| d | a    | b    |
+---+------+------+
| 0 |    1 |    5 |
| 0 |    2 |    5 |
+---+------+------+
2 rows in set (0.00 sec)

mysql> # But if you put the constant at the end of the SELECT clause instead of the 
mysql> # beginning it returns incorrect results:
mysql> # The non-aggreated columns (b & d) still contain the same value for every record.
mysql> SELECT DISTINCT a, b, 0 as d
    -> FROM (SELECT * FROM foo ORDER BY a, b) AS f 
    -> GROUP BY a,c;
+------+------+---+
| a    | b    | d |
+------+------+---+
|    1 |    5 | 0 |
+------+------+---+
1 row in set (0.00 sec)
[21 Oct 2013 14:08] Jørgen Løland
Thank you for the bug report. Verified as described.
[7 Nov 2013 13:21] Brantley Hobbs
We have the same issues on Windows Server 2012.
[7 Feb 2014 17:50] Paul Dubois
Noted in 5.6.17, 5.7.4 changelogs.

A query that creates a temporary table to find distinct values and
has a constant value in the projected list could produce incorrect
results.
[28 Mar 2014 19:13] Laurynas Biveinis
5.6$ bzr log -r 5761 -n0
------------------------------------------------------------
revno: 5761
committer: Neeraj Bisht <neeraj.x.bisht@oracle.com>
branch nick: 5.6
timestamp: Mon 2014-01-20 11:16:39 +0530
message:
  Bug#17634335 - SELECT DISTINCT...GROUP BY RETURNS WRONG RESULTS IN SOME CASES
  
  Problem:
  Query creating temporary table to find the distinct value and have 
  constant value in projected list doesn't give correct value.
  
  Analysis:
  
  When we have to find the distinct value in our record,
  We will call JOIN_TAB::remove_duplicates().
  To find the position of the item, from which we have to find the 
  distinct value, we will use this 
  -->Number of item in the table - number of non-constant items of the select list.
  
  so in the case like
  SELECT DISTINCT a, b
  FROM (SELECT * FROM foo ORDER BY a, b) AS f
  GROUP BY a,c;
  
  
  The fields of the temporary table will be c, a, b in that order
  (as first it is considering the item that are not in the select 
  list, then select list distinct item).  
  That is, the number of items in the list will be 3.
  The number of items in the select list which are not constant will be 
  2(a,b).
  so 3-2=1, that's why we will considered items a, b to find the distinct values.
  
  In the case of bug we have 
  
  SELECT DISTINCT a, b, 0 as d
  FROM (SELECT * FROM foo ORDER BY a, b) AS f
  GROUP BY a,c;
  
  Resulting in
  4(Number of item in the table that is (c,a,b,0)) - 2(number of non-constant items of the select list that is (a,b))
  which is equal to 2, So we start our distinct from b instead of a, which result 
  in incorrect result.
  
  SOLUTION:
  Instead of doing, Number of item in the table - Number of item in the projected 
  list which are not constant.
  Do Number of item in the table - Number of item in the select list. Then we will get the 
  correct point, from which we have to find the distinct item.
[6 Aug 2014 17:42] Laurynas Biveinis
$ bzr log -n0 -r 5950
------------------------------------------------------------
revno: 5950
committer: Chaithra Gopalareddy <chaithra.gopalareddy@oracle.com>
branch nick: mysql-5.6
timestamp: Wed 2014-05-21 12:46:14 +0530
message:
  Bug#11760197:INCORRECT RESULTS WITH COUNT DISTINCT AND BIG_TABLES
  Problem:
  When big_tables is enabled, COUNT DISTINCT on simple join with
  constant equality condition gives wrong result.
  
  Analysis:-
  In case of count distinct with sql_big_tables enable, optimizer
  saves data in myisam file instead of heap unique tree.
  When a constant equality condition is present, it does not
  detect duplicate field values when inserting the records into the
  temporary table.
  
  Original solution:
  While creating temporary table, allow creations of fields for constant
  items. When we have distinct query, this will make sure the duplicate
  value filtered out when inserting rows in the temporary table.
  
  Side Effects:
  Bug#17555462 - SELECT DISTINCT OF VIEW WITH CONSTANT STRING
                 RETURNS WRONG RESULT
  Problem:
  In a temporary table if a field is created for a const ref_item,
  and if this ref_item is a reference over a constant string and not
  on a field results are not as expected.
  
  Analysis:
  This is a regression from the patch for Bug#11760197. Post
  this bugfix, a field gets created in temporary table
  even for const_item's. If the const_item is of type Item_ref,
  while creating the temporary field, item->result_field
  is made to point to the newly created field.
  While the original item would hold the constant's value,
  the new field created while changing ref's to use temporary
  fields, will not be having the value.
  As a result we loose the value when fetching the results.
  
  In similar lines:
  Bug #17607155: ASSERTION FAILED: ((ITEM_RESULT_FIELD*)ITEM)->RESULT_FIELD
  Bug #17957913: ASSERTION `((ITEM_RESULT_FIELD*)ITEM)->RESULT_FIELD' FAILS
                 IN CREATE_TMP_FIELD
  Problem:
  Query having a GROUP BY CLAUSE with ROLLUP modifier and a
  GROUP_CONCAT/COUNT(DISTINCT) function with constant expression,
  causes an assert.
  
  Analysis:
  The GROUP_CONCAT/COUNT(DISTINCT) uses its own temporary table.
  When ROLLUP is present it creates the second copy of temporary
  table.
  This copy receives the same list of arguments that original
  group_concat/count(distinct) does which will help to copy
  the content of the result_field for the function under
  GROUP_CONCAT/COUNT from  the first temporary table to the second
  temporary table.
  
  In the case, when constant item is present, result_field will carry
  null value. After the fix for Bug#11760197, while creating field
  for second temporary table as result_field for the constant
  expression is not set, it asserts.
  
  Bug#17634335: SELECT DISTINCT...GROUP BY RETURNS WRONG RESULTS
  IN SOME CASES
  
  Query creating temporary table to find the distinct value and has
  constant value in projected list doesn't give correct result.
  
  Analysis:
  After the fix for Bug#11760197 const_items also were created as
  part of temporary tables. In the call to remove_duplicates() an
  assumption against the same was made which resulted in the above bug.
  
  All the above bugs were side effects of the fix made for Bug#11760197.
  
  Current solution:
  Distinct of a constant value will always be the constant value and
  count distinct of the same will always be one. Based on this,
  a new variable const_distinct is introduced. If enabled, temporary
  table is not created and aggregation is also avoided as the result
  will always be one.
  
  Works in similar lines to always_null.
[6 Aug 2014 17:43] Laurynas Biveinis
See bug 52582, bug 71149.