Bug #117242 DISTINCT does not fetch all distinct rows when used with GROUP BY
Submitted: 20 Jan 9:13 Modified: 20 Jan 9:44
Reporter: Emily Ong Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0, 8.0.40, 8.4.3 OS:Ubuntu
Assigned to: CPU Architecture:x86

[20 Jan 9:13] Emily Ong
Description:
Using DISTINCT with GROUP BY does not fetch all distinct rows. This occurred in versions >= 8.0 but not in earlier versions.

How to repeat:
CREATE TABLE IF NOT EXISTS t0(c0 INT);
INSERT INTO t0(c0) VALUES (NULL), (1);

(1) In the following query, 2 distinct rows are returned (expected):

SELECT ALL
  t0.c0 != NULL AS ref0,
  t0.c0 OR (NOT (t0.c0 AND NULL)) AS ref1,
  COUNT(DISTINCT(1)) AS ref2
FROM t0
GROUP BY
  t0.c0 != NULL,
  t0.c0 OR (NOT (t0.c0 AND NULL));

| ref0 | ref1 | ref2 |
| ---- | ---- | ---- |
| null | null | 1    |
| null | 1    | 1    |

(2) When the query in (1) uses the DISTINCT row specifier (instead of ALL), only 1 row is returned (unexpected):

SELECT DISTINCT
  t0.c0 != NULL AS ref0,
  t0.c0 OR (NOT (t0.c0 AND NULL)) AS ref1,
  COUNT(DISTINCT(1)) AS ref2
FROM t0
GROUP BY
  t0.c0 != NULL,
  t0.c0 OR (NOT (t0.c0 AND NULL));

| ref0 | ref1 | ref2 |
| ---- | ---- | ---- |
| null | null | 1    |

According to the documentation (https://dev.mysql.com/doc/refman/8.4/en/select.html), "DISTINCT specifies removal of duplicate rows from the result set".
Thus, the result is unexpected because the two rows (as seen in the result for (1)) differ in their values of `ref1` (i.e., `null` in the 1st row vs `1` in the 2nd row), hence these two rows should be considered distinct. Yet, only one row is seen in the result.

(3) However, when the query uses the aliases in the GROUP BY expressions, 2 rows are correctly returned (expected):

SELECT DISTINCT
  t0.c0 != NULL AS ref0,
  t0.c0 OR (NOT (t0.c0 AND NULL)) AS ref1,
  COUNT(DISTINCT(1)) AS ref2
FROM t0
GROUP BY
  ref0, ref1

| ref0 | ref1 | ref2 |
| ---- | ---- | ---- |
| null | null | 1    |
| null | 1    | 1    |

However, this query is semantically equivalent to the above one (2), with the only difference being the use of aliases in the group by expression. Thus, it is expected that both queries should return the same result.

Furthermore, in MySQL-compatible databases, such as MariaDB (11.5), TiDB (8.5), the same query produces two rows, which is expected.

| ref0 | ref1 | ref2 |
| ---- | ---- | ---- |
| null | null | 1    |
| null | 1    | 1    |
[20 Jan 9:44] MySQL Verification Team
Hello Emily Ong,

Thank you for the report and test case.

regards,
Umesh