Description:
Using multicolumn indexes might lead to rows not being fetched in a query with GROUP BY. This occurred in versions >= 8.0 but not in earlier versions.
How to repeat:
CREATE TABLE IF NOT EXISTS t0(c0 INT, c1 INT);
INSERT INTO t0(c0, c1) VALUES(NULL, NULL), (NULL, NULL), (1, NULL);
CREATE INDEX i0 ON t0(c1, c0 ASC);
(1) The table is as follows:
| c0 | c1 |
| ---- | ---- |
| NULL | NULL |
| NULL | NULL |
| 1 | NULL |
(2) In the following query, 0 rows are returned (unexpected):
SELECT
t0.c1 AS ref0,
MIN(t0.c0) AS ref1
FROM t0
WHERE t0.c0 IS NOT TRUE
GROUP BY t0.c1;
| ref0 | ref1 |
| ---- | ---- |
This is unexpected because from the table in (1), both the 1st and 2nd rows with `NULL` values in their c0-columns satisfy `t0.c0 IS NOT TRUE`.
Therefore, the GROUP BY operation should be performed on the first two rows, hence should not return an empty result.
The expected result should be:
| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |
(3) When the index i0 is removed, 1 row is returned (expected):
DROP INDEX i0 ON t0;
SELECT
t0.c1 AS ref0,
MIN(t0.c0) AS ref1
FROM t0
WHERE t0.c0 IS NOT TRUE
GROUP BY t0.c1;
| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |
According to the documentation (https://dev.mysql.com/doc/refman/8.4/en/mysql-indexes.html), indexes are used during query optimization to "find rows with specific column values quickly".
Thus, both the queries in (2) and (3) are semantically equivalent since the use of indexes should not change the result of the query execution.
Furthermore, in MySQL-compatible databases, such as MariaDB (11.5), TiDB (8.5), the same query produces the expected result.
| ref0 | ref1 |
| ---- | ---- |
| NULL | NULL |