Bug #99931 Aggregate group by returns incorrect result when loose index scan is used
Submitted: 18 Jun 2020 19:41 Modified: 19 Jun 2020 6:56
Reporter: Bart Dubelaar Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:8.0.12, 8.0.18 OS:Any
Assigned to: CPU Architecture:Any

[18 Jun 2020 19:41] Bart Dubelaar
Description:
A max wise aggregate, which is also using a where condition, may return incorrect results in case a loose index scan is used for the group by. 

See reproduction below. The result of the second query, with IGNORE INDEX applied is the expected outcome. In the result of the first query the value for group "A" is incorrect, the returned result is also not the value of the overall max, but a seemingly unrelated value.

I could verify that this issue is present in MySQL versions 8.0.12 and 8.0.18. It seems to be resolved in 8.0.19 but the question is if this is an intentional fix or coincidental.

Adding the column used in the WHERE condition (type) to the GROUP BY statement also 'fixes' the issue and produces the expected result, even when a loose index scan is used.

How to repeat:
CREATE TABLE `test_table` (
  `id` int(1) NOT NULL,
  `aggregation` char(1) NOT NULL,
  `type` int(1) NOT NULL,
  `time` datetime NOT NULL
) ENGINE=InnoDB;

--
-- Insert data for table `test_table`
--

INSERT INTO `test_table` (`id`, `aggregation`, `type`, `time`) VALUES
(1, 'A', 1, '2020-05-21 00:00:00'),
(6, 'A', 1, '2020-05-22 00:00:00'),
(2, 'A', 2, '2020-05-20 00:00:00'),
(3, 'B', 2, '2020-05-19 00:00:00'),
(4, 'B', 2, '2020-05-18 00:00:00'),
(5, 'B', 2, '2020-05-17 00:00:00');

--
-- Indexes for table `test_table`
--
ALTER TABLE `test_table`
  ADD PRIMARY KEY (`id`),
  ADD KEY `aggregation_type_time` (`aggregation`,`type`,`time`)

SELECT aggregation, MAX(`time`) AS lastlog FROM test_table 
FORCE INDEX (aggregation_type_time) WHERE type=2 GROUP BY aggregation;

SELECT aggregation, MAX(`time`) AS lastlog FROM test_table 
IGNORE INDEX (aggregation_type_time) WHERE type=2 GROUP BY aggregation;

Results:
Query #1 Execution time: 1ms
aggregation	lastlog
A		2020-05-21 00:00:00
B		2020-05-19 00:00:00

Query #2 Execution time: 1ms
aggregation	lastlog
A		2020-05-20 00:00:00
B		2020-05-19 00:00:00
[19 Jun 2020 2:38] MySQL Verification Team
Thank you for the bug report. Please when reporting bugs use the latest released version. Not repeatable with current source server, so already fixed:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.22 Source distribution BUILT: 2020-JUN-13

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 8.0 > use www;
Database changed
mysql 8.0 > CREATE TABLE `test_table` (
    ->   `id` int(1) NOT NULL,
    ->   `aggregation` char(1) NOT NULL,
    ->   `type` int(1) NOT NULL,
    ->   `time` datetime NOT NULL
    -> ) ENGINE=InnoDB;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql 8.0 > INSERT INTO `test_table` (`id`, `aggregation`, `type`, `time`) VALUES
    -> (1, 'A', 1, '2020-05-21 00:00:00'),
    -> (6, 'A', 1, '2020-05-22 00:00:00'),
    -> (2, 'A', 2, '2020-05-20 00:00:00'),
    -> (3, 'B', 2, '2020-05-19 00:00:00'),
    -> (4, 'B', 2, '2020-05-18 00:00:00'),
    -> (5, 'B', 2, '2020-05-17 00:00:00');
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql 8.0 > ALTER TABLE `test_table`
    ->   ADD PRIMARY KEY (`id`),
    ->   ADD KEY `aggregation_type_time` (`aggregation`,`type`,`time`)
    -> ;
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql 8.0 > SELECT aggregation, MAX(`time`) AS lastlog FROM test_table
    -> FORCE INDEX (aggregation_type_time) WHERE type=2 GROUP BY aggregation;
+-------------+---------------------+
| aggregation | lastlog             |
+-------------+---------------------+
| A           | 2020-05-20 00:00:00 |
| B           | 2020-05-19 00:00:00 |
+-------------+---------------------+
2 rows in set (0.00 sec)

mysql 8.0 > SELECT aggregation, MAX(`time`) AS lastlog FROM test_table
    -> IGNORE INDEX (aggregation_type_time) WHERE type=2 GROUP BY aggregation;
+-------------+---------------------+
| aggregation | lastlog             |
+-------------+---------------------+
| A           | 2020-05-20 00:00:00 |
| B           | 2020-05-19 00:00:00 |
+-------------+---------------------+
2 rows in set (0.00 sec)

mysql 8.0 >
[19 Jun 2020 6:56] Bart Dubelaar
Thanks for the response. Understandable. I'm a bit worried about regression though, as I could not find a clearly related bug fix in the change log of 8.0.19. So therefore I filed it. At least it may be helpful for future reference in case regression occurs.