MySQL Bugs: #117261: Creation of multicolumn indexes might lead to rows not being fetched during GROUP BY

Bug #117261	Creation of multicolumn indexes might lead to rows not being fetched during GROUP BY
Submitted:	22 Jan 8:37	Modified:	22 Jan 9:06
Reporter:	Emily Ong	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S3 (Non-critical)
Version:	8.0, 8.0.40, 8.4.3	OS:	Ubuntu
Assigned to:		CPU Architecture:	x86

Description:
Using multicolumn indexes might lead to rows not being fetched in a query with GROUP BY. This occurred in versions >= 8.0 but not in earlier versions.

How to repeat:
CREATE TABLE IF NOT EXISTS t0(c0 INT, c1 INT);
INSERT INTO t0(c0, c1) VALUES(NULL, NULL), (NULL, NULL), (1, NULL);
CREATE INDEX i0 ON t0(c1, c0 ASC);

(1) The table is as follows:

|  c0  |  c1  |
| ---- | ---- |
| NULL | NULL |
| NULL | NULL |
|  1   | NULL |

(2) In the following query, 0 rows are returned (unexpected):

SELECT
  t0.c1 AS ref0,
  MIN(t0.c0) AS ref1
FROM t0
WHERE t0.c0 IS NOT TRUE
GROUP BY t0.c1;

| ref0 | ref1 |
| ---- | ---- |

This is unexpected because from the table in (1), both the 1st and 2nd rows with `NULL` values in their c0-columns satisfy `t0.c0 IS NOT TRUE`.
Therefore, the GROUP BY operation should be performed on the first two rows, hence should not return an empty result.

The expected result should be:

| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |

(3) When the index i0 is removed, 1 row is returned (expected):

DROP INDEX i0 ON t0;
SELECT
  t0.c1 AS ref0,
  MIN(t0.c0) AS ref1
FROM t0
WHERE t0.c0 IS NOT TRUE
GROUP BY t0.c1;

| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |

According to the documentation (https://dev.mysql.com/doc/refman/8.4/en/mysql-indexes.html), indexes are used during query optimization to "find rows with specific column values quickly".
Thus, both the queries in (2) and (3) are semantically equivalent since the use of indexes should not change the result of the query execution.

Furthermore, in MySQL-compatible databases, such as MariaDB (11.5), TiDB (8.5), the same query produces the expected result.

| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |

Hello Emily Ong,

Thank you for the report and test case.

regards,
Umesh

The `GROUP_INDEX_SKIP_SCAN` operator(`GroupIndexSkipScanIterator`) skip subsequent rows in the same group with NULL in it, if the min/max argument field is NULL. It would be a much anticipated feature that the optimizer merge the condition of min/max argument field into GroupIndexSkipScanIterator.