Bug #109935 Aggregate function max and min can't get right result .
Submitted: 4 Feb 2023 14:13 Modified: 11 Apr 10:20
Reporter: Lin TSE YU Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[4 Feb 2023 14:13] Lin TSE YU
Description:
When this table has transaction running ,the select query ,like below ,expected to return three rows ,but MySQL only return one row .
	select good_code ,max(sdt)
	from ta
	where good_code in ('1001' ,'1002' ,'1003')
	group by good_code ;

How to repeat:
1. Import ta.sql and ta_bk.sql 
2. Query : 
	select good_code ,max(sdt)
	from ta
	where good_code in ('1001' ,'1002' ,'1003')
	group by good_code ;
3. Start transaction then trigger the insert command as below and don't commit it
	begin;
	insert ignore into ta(id ,good_code ,sdt ,log_dt)
	select id ,good_code ,sdt ,log_dt
	from ta_bk
	where log_dt >= '2023-01-13' and log_dt < '2023-01-14' ;

4. Start another transaction to query the step 2 statement ,then you can see this bug.

5. This bug only show when step 2 statement include the where condition.
[6 Feb 2023 14:38] MySQL Verification Team
Hi Mr. Yu,

Thank you for your bug report.

However, what you report is expected behaviour for all MVCC storage engine. Visibility and consistency with various isolation levels simply must be obeyed, per the standard.

Not a bug.
[7 Feb 2023 1:49] Lin TSE YU
In the step 4 ,if I trigger the select query without where condition statement ,I can get the max value each good_code .

mysql> select good_code ,max(sdt)
    -> from ta
    -> group by good_code ;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
| 1002      | 202302031000 |
| 1003      | 20230203666  |
| 1022      | 20230203     |
| 1023      | 20230203     |
| 1101      | 202302031333 |
| 1102      | 202302031000 |
| 1103      | 20230203666  |
| 1201      | 202302031413 |
| 1202      | 202302031060 |
| 1203      | 20230203706  |
| 1301      | 202302031413 |
| 1302      | 202302031060 |
| 1303      | 20230203706  |
| 1304      | 2957493      |
| 1305      | 722494       |
| 1306      | 173431       |
| 1307      | 20230203114  |
| 1401      | 202302031413 |
| 1402      | 202302031060 |
| 1403      | 20230203706  |
| 1404      | 20230203026  |
| 1405      | 20230203027  |
| 1406      | 20230203049  |
| 1407      | 20230203104  |
| 1408      | 20230203212  |
| 1501      | 202302031213 |
| 1502      | 202302030910 |
| 1503      | 20230203606  |
| 1506      | 20230202     |
+-----------+--------------+
30 rows in set (0.01 sec)

If the select query include the where condition statement ,the result is wrong.
Below three queries are in the same transaction .
The first query only got the "1001" max value ,but the return result expect 3 rows
The "1002" and "1003" are can visible ,but the aggreage function can't find them.

mysql> select good_code ,max(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
+-----------+--------------+
1 row in set (0.01 sec)

mysql> select * from ta where good_code = '1002' limit 2 ;
+---------+-----------+--------------+-------------------------+
| id      | good_code | sdt          | log_dt                  |
+---------+-----------+--------------+-------------------------+
| 4296563 | 1002      | 202302021380 | 2023-02-03 00:00:08.013 |
| 4296578 | 1002      | 202302021381 | 2023-02-03 00:01:08.008 |
+---------+-----------+--------------+-------------------------+
2 rows in set (0.01 sec)

mysql> select * from ta where good_code = '1003' limit 2 ;
+---------+-----------+-------------+-------------------------+
| id      | good_code | sdt         | log_dt                  |
+---------+-----------+-------------+-------------------------+
| 4296561 | 1003      | 20230202920 | 2023-02-03 00:00:08.013 |
| 4296589 | 1003      | 20230202921 | 2023-02-03 00:01:38.007 |
+---------+-----------+-------------+-------------------------+
2 rows in set (0.00 sec)
[7 Feb 2023 13:26] MySQL Verification Team
Hi Mr. YU,

If you do not use proper aggregate query , but WHERE clause with a LIMIT, then maximum value might not be included.

That is expected behaviour.
[8 Feb 2023 7:38] Lin TSE YU
I can used the COUNT/AVG/SUM aggregate function to get the 1001 ,1002 ,1003 each count/average/total sum results
,but if I used the MAX/MIN aggregate function ,I only got 1001 max/min result
I'm not very understand why the results of MAX/MIN function are different with COUNT/AVG/SUM ?

select good_code ,count(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code; -- return 3 rows
select good_code ,avg(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code; -- return 3 rows
select good_code ,sum(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code; -- return 3 rows
select good_code ,max(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code; -- return only 1 rows
select good_code ,min(sdt) from ta  where good_code in ('1001' ,'1002' ,'1003') group by good_code; -- return only 1 rows

I test the max / min function query on 5.7.41version , the two query can return 3 rows.
[8 Feb 2023 13:00] MySQL Verification Team
Hi Mr. YU,

MySQL 8.0 contains many improvements over 5.7.

MIN/MAX had to conform to the new SQL standards.

Unlike other functions that you are mentioning, those are not aggregating ones.
[9 Feb 2023 9:43] Lin TSE YU
Hi Verification team,
Let me re-descript this issue

1. (isolation-level=RR)
Start a new session T1 and query the max value of 1001,1002,1003
mysql> select good_code ,max(sdt) from ta where good_code in ('1001' ,'1002' ,'1003') group by good_code;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
| 1002      | 202302031000 |
| 1003      | 20230203666  |
+-----------+--------------+
3 rows in set (0.00 sec)

2.
Start another new seesion T2 and insert some data .
mysql> insert ignore into ta(id ,good_code ,sdt ,log_dt)
    -> select id ,good_code ,sdt ,log_dt from ta_bk where log_dt >= '2023-01-13' and log_dt < '2023-01-14' ;
Query OK, 28095 rows affected (0.58 sec)
Records: 28095  Duplicates: 0  Warnings: 0

3.check data_locks  
mysql> select * from performance_schema.data_locks where lock_status <> 'GRANTED' ;
Empty set (0.94 sec)

4.
Session T1 execute the same select statement again .
mysql> select good_code ,max(sdt) from ta where good_code in ('1001' ,'1002' ,'1003') group by good_code;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
+-----------+--------------+
1 row in set (0.01 sec)

5.
Two sessions are not commit ,but T1 can't get the same result(1002 ,1003 are disappear) in the RR mode.

6.
IF T2 still not commit ,any sessions will get result like step 4 ,either new session or old session .
[9 Feb 2023 13:37] MySQL Verification Team
Hi Mr. Yu,

Thank you for your latest contribution.

We managed to reproduce your results:

elect good_code ,max(sdt) from ta where good_code in ('1001','1002' ,'1003') group by good_code;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
| 1002      | 202302031000 |
| 1003      | 20230203666  |
+-----------+--------------+
3 rows in set (0.01 sec)

mysql> select good_code ,max(sdt) from ta where good_code in ('1001','1002' ,'1003') group by good_code;
+-----------+--------------+
| good_code | max(sdt)     |
+-----------+--------------+
| 1001      | 202302031333 |
+-----------+--------------+
1 row in set (0.08 sec)

We thought that it is a known bug, but it is different from that bug, since disabling binary logging does not help.

Thank you for your contribution.

Verified as described.
[10 Apr 17:12] Gabor Buella
Posted by developer:
 
Fixed, see: https://bugs.mysql.com/bug.php?id=112541
[11 Apr 10:20] MySQL Verification Team
Documented fix as follows in the MySQL 8.0.37, 8.4.0, and 9.0.0 changelogs:

    Incorrect results were sometimes obtained from a query that used
    a group by loose index scan.

Closed.