Bug #12887 distinct vs rollup, are applied in inconsistent ways
Submitted: 30 Aug 2005 16:39 Modified: 15 Sep 2005 19:35
Reporter: Martin Friebe (Gold Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.14, 5.0.11, 4.1.15-BK OS:Linux (Linux, freebsd, Windows)
Assigned to: Evgeny Potemkin CPU Architecture:Any

[30 Aug 2005 16:39] Martin Friebe
Description:
I dont know, if in a query like
 select distinct  fields from t1 group by field with rollup
the distinct should be applied on the result of rollup, or rollup should be added to the result.

Below I provide 2 examples. each showing the correct result without distinct, and then the result with distinct.

The 1st  mysql applies distinct on the result of the rollup.

The 2nd rollup is applied, after distinct

How to repeat:
create table t1 ( a varchar(9), b int ); insert into t1 values('a',1), (null,2);

#example 1
select  a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
| NULL |      2 |
+------+--------+

select distinct a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
+------+--------+
2 rows in set (0.00 sec)

#Example 2
select  a, max(b)  from t1 group by a with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| NULL |      2 |
+------+--------+
3 rows in set (0.01 sec)

select distinct a, max(b)  from t1 group by a with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| NULL |      2 |
+------+--------+
3 rows in set (0.00 sec)

drop table t1;

Suggested fix:
-define and document wanted behaviour, please.

If distinct should be applied before rollup, please also check this query:

select distinct a, max(b)  from t1 group by 'X' with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      2 |
+------+--------+
1 row in set (0.00 sec)
[30 Aug 2005 16:46] Martin Friebe
if you change  the 2nd example (distinct after rollup) to a differnt order "group by b desc", the result set changes.

 select distinct a, max(b)  from t1 group by b desc with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| a    |      2 |
+------+--------+
3 rows in set (0.00 sec)
[31 Aug 2005 9:30] Valeriy Kravchuk
Thank you for your bug report. Verified both on 4.1.14 and 5.0.11.

Yes, it is not clear form the socumentation when distinct shoud apply - before grouping or after. Looks like it is applied after... 

But, in any case the following:

mysql> select  distinct a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
+------+--------+
2 rows in set (0.00 sec)

mysql> select  distinct a, max(b)  from t1 group by b desc with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| a    |      2 |
+------+--------+
3 rows in set (0.00 sec)

mysql> select version();
+----------------+
| version()      |
+----------------+
| 5.0.11-beta-nt |
+----------------+
1 row in set (0.00 sec)

is a bug.
[31 Aug 2005 9:40] Valeriy Kravchuk
Verified on the latest 4.1.14-BK build on Linux also.
[14 Sep 2005 16:11] Evgeny Potemkin
1) As mentioned in http://dev.mysql.com/doc/mysql/en/select.html "DISTINCT and DISTINCTROW are synonyms and specify that duplicate rows in the result set should be removed.". WITH ROLLUP is modifier of GROUP  BY. Thus DISTINCT should be applied after ROLLUP. This is documented, but may be should be noted explicitly somewhere.

2) Example#1 works ok - duplicate produced by ROLLUP is removed.
Example #2 - DISTINCT isn't applied - is the bug. Have to be fixed.

3) "select distinct a, max(b)  from t1 group by 'X' with rollup;" - field 'a' treated by server as hidden field (http://dev.mysql.com/doc/mysql/en/group-by-hidden-fields.html) and it's result is undefined, because 'X' is constant grouping performed over whole table and with ROLLUP it produces result set ('a',2),('a',2). DISTINCTremoves duplicate rows and thus the final result set is ('a',2). So - everything is ok.

4) Changing "group by b" to "group by b desc" changes result set - here column 'a' is hidden and result of it is undefined in this case (documented in group-by-hidden-fields).
Adding 'desc' changes actual value of undefined 'a' and thus result sets to which DISTINCT is being applied are differs. Because of this the queries produces different final result sets.
[14 Sep 2005 16:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/29855
[15 Sep 2005 18:07] Evgeny Potemkin
For queries with GROUP BY and without hidden GROUP BY fields DISTINCT is
  optimized away becuase such queries produce result set without duplicates.
  But ROLLUP can add rows which may be same to some rows and this fact was
  ignored.

Fixed in 4.1.15, cset 1.2433.2.1
[15 Sep 2005 19:25] Evgeny Potemkin
Fixed in 5.0.14
[15 Sep 2005 19:35] Paul DuBois
Noted in 4.1.15, 5.0.14 changelogs.