Bug #12887 distinct vs rollup, are applied in inconsistent ways
Submitted: 30 Aug 2005 18:39 Modified: 15 Sep 2005 21:35
Reporter: Martin Friebe (Gold Quality Contributor)
Status: Closed
Category:Server: Optimizer Severity:S3 (Non-critical)
Version:4.1.14, 5.0.11, 4.1.15-BK OS:Linux (Linux, freebsd, Windows)
Assigned to: Evgeny Potemkin Target Version:

[30 Aug 2005 18:39] Martin Friebe
Description:
I dont know, if in a query like
 select distinct  fields from t1 group by field with rollup
the distinct should be applied on the result of rollup, or rollup should be added to the
result.

Below I provide 2 examples. each showing the correct result without distinct, and then the
result with distinct.

The 1st  mysql applies distinct on the result of the rollup.

The 2nd rollup is applied, after distinct

How to repeat:
create table t1 ( a varchar(9), b int ); insert into t1 values('a',1), (null,2);

#example 1
select  a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
| NULL |      2 |
+------+--------+

select distinct a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
+------+--------+
2 rows in set (0.00 sec)

#Example 2
select  a, max(b)  from t1 group by a with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| NULL |      2 |
+------+--------+
3 rows in set (0.01 sec)

select distinct a, max(b)  from t1 group by a with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| NULL |      2 |
+------+--------+
3 rows in set (0.00 sec)

drop table t1;

Suggested fix:
-define and document wanted behaviour, please.

If distinct should be applied before rollup, please also check this query:

select distinct a, max(b)  from t1 group by 'X' with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      2 |
+------+--------+
1 row in set (0.00 sec)
[30 Aug 2005 18:46] Martin Friebe
if you change  the 2nd example (distinct after rollup) to a differnt order "group by b
desc", the result set changes.

 select distinct a, max(b)  from t1 group by b desc with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| a    |      2 |
+------+--------+
3 rows in set (0.00 sec)
[31 Aug 2005 11:30] Valeriy Kravchuk
Thank you for your bug report. Verified both on 4.1.14 and 5.0.11.

Yes, it is not clear form the socumentation when distinct shoud apply - before grouping or
after. Looks like it is applied after... 

But, in any case the following:

mysql> select  distinct a, max(b)  from t1 group by b with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| a    |      1 |
| NULL |      2 |
+------+--------+
2 rows in set (0.00 sec)

mysql> select  distinct a, max(b)  from t1 group by b desc with rollup;
+------+--------+
| a    | max(b) |
+------+--------+
| NULL |      2 |
| a    |      1 |
| a    |      2 |
+------+--------+
3 rows in set (0.00 sec)

mysql> select version();
+----------------+
| version()      |
+----------------+
| 5.0.11-beta-nt |
+----------------+
1 row in set (0.00 sec)

is a bug.
[31 Aug 2005 11:40] Valeriy Kravchuk
Verified on the latest 4.1.14-BK build on Linux also.
[14 Sep 2005 18:11] Evgeny Potemkin
1) As mentioned in http://dev.mysql.com/doc/mysql/en/select.html "DISTINCT and DISTINCTROW
are synonyms and specify that duplicate rows in the result set should be removed.". WITH
ROLLUP is modifier of GROUP  BY. Thus DISTINCT should be applied after ROLLUP. This is
documented, but may be should be noted explicitly somewhere.

2) Example#1 works ok - duplicate produced by ROLLUP is removed.
Example #2 - DISTINCT isn't applied - is the bug. Have to be fixed.

3) "select distinct a, max(b)  from t1 group by 'X' with rollup;" - field 'a' treated by
server as hidden field (http://dev.mysql.com/doc/mysql/en/group-by-hidden-fields.html) and
it's result is undefined, because 'X' is constant grouping performed over whole table and
with ROLLUP it produces result set ('a',2),('a',2). DISTINCTremoves duplicate rows and
thus the final result set is ('a',2). So - everything is ok.

4) Changing "group by b" to "group by b desc" changes result set - here column 'a' is
hidden and result of it is undefined in this case (documented in group-by-hidden-fields).
Adding 'desc' changes actual value of undefined 'a' and thus result sets to which DISTINCT
is being applied are differs. Because of this the queries produces different final result
sets.
[14 Sep 2005 18:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/29855
[15 Sep 2005 20:07] Evgeny Potemkin
For queries with GROUP BY and without hidden GROUP BY fields DISTINCT is
  optimized away becuase such queries produce result set without duplicates.
  But ROLLUP can add rows which may be same to some rows and this fact was
  ignored.

Fixed in 4.1.15, cset 1.2433.2.1
[15 Sep 2005 21:25] Evgeny Potemkin
Fixed in 5.0.14
[15 Sep 2005 21:35] Paul DuBois
Noted in 4.1.15, 5.0.14 changelogs.