Bug #114589 Incorrect results when using distinct and order by and derived table
Submitted: 9 Apr 2024 3:32 Modified: 7 Aug 2024 15:19
Reporter: karry zhang (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[9 Apr 2024 3:32] karry zhang
Description:
The query returns incorrect results when using distinct and order by and derived.

How to repeat:
 CREATE TABLE `tt` (
  `id` bigint unsigned NOT NULL,
  `name` varchar(20) DEFAULT NULL,
  `time` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

insert into tt values (0, 'kk', '20230312');
insert into tt values (1, 'kk', '20230309');
insert into tt values (2, 'kk', '20230314');
insert into tt values (3, 'kk', '20230320');
insert into tt values (4, 'ss', '20230305');
insert into tt values (5, 'ss', '20230304');
insert into tt values (6, 'ss', '20230320');

select distinct t0.name, str_to_date(t0.time,'%Y%m%d') from (select * from tt)t0 order by t0.name asc, str_to_date(t0.time,'%Y%m%d') asc;
+------+-------------------------------+
| name | str_to_date(t0.time,'%Y%m%d') |
+------+-------------------------------+
| kk   | 2023-03-12                    |
| ss   | 2023-03-05                    |
+------+-------------------------------+

if you use alias, the result is correct:

select distinct t0.name, str_to_date(t0.time,'%Y%m%d') as f1 from (select * from tt)t0 order by t0.name asc, f1 asc;
+------+------------+
| name | f1         |
+------+------------+
| kk   | 2023-03-09 |
| kk   | 2023-03-12 |
| kk   | 2023-03-14 |
| kk   | 2023-03-20 |
| ss   | 2023-03-04 |
| ss   | 2023-03-05 |
| ss   | 2023-03-20 |
+------+------------+

without using derived table, the result is correct too:

select distinct name, str_to_date(time,'%Y%m%d') from tt order by name asc, str_to_date(time,'%Y%m%d') asc;
+------+----------------------------+
| name | str_to_date(time,'%Y%m%d') |
+------+----------------------------+
| kk   | 2023-03-09                 |
| kk   | 2023-03-12                 |
| kk   | 2023-03-14                 |
| kk   | 2023-03-20                 |
| ss   | 2023-03-04                 |
| ss   | 2023-03-05                 |
| ss   | 2023-03-20                 |
+------+----------------------------+

Suggested fix:
I'm not familiar with the relevant code, but I think using distinct and order by and derived table at the same time should return the correct results. If this usage is not allowed, an error should be prompted.
[9 Apr 2024 3:42] karry zhang
without distinct the result is correct:

select t0.name, str_to_date(t0.time,'%Y%m%d') from (select * from tt)t0 order by t0.name asc, str_to_date(t0.time,'%Y%m%d') asc;
+------+-------------------------------+
| name | str_to_date(t0.time,'%Y%m%d') |
+------+-------------------------------+
| kk   | 2023-03-09                    |
| kk   | 2023-03-12                    |
| kk   | 2023-03-14                    |
| kk   | 2023-03-20                    |
| ss   | 2023-03-04                    |
| ss   | 2023-03-05                    |
| ss   | 2023-03-20                    |
+------+-------------------------------+

without order by the result is correct:
select distinct name, str_to_date(time,'%Y%m%d') from tt;
+------+----------------------------+
| name | str_to_date(time,'%Y%m%d') |
+------+----------------------------+
| kk   | 2023-03-12                 |
| kk   | 2023-03-09                 |
| kk   | 2023-03-14                 |
| kk   | 2023-03-20                 |
| ss   | 2023-03-05                 |
| ss   | 2023-03-04                 |
| ss   | 2023-03-20                 |
+------+----------------------------+

without str_to_date in order by the result is correct:

select distinct t0.name, str_to_date(t0.time,'%Y%m%d') from (select * from tt)t0 order by t0.name asc;

+------+-------------------------------+
| name | str_to_date(t0.time,'%Y%m%d') |
+------+-------------------------------+
| kk   | 2023-03-09                    |
| kk   | 2023-03-12                    |
| kk   | 2023-03-14                    |
| kk   | 2023-03-20                    |
| ss   | 2023-03-04                    |
| ss   | 2023-03-05                    |
| ss   | 2023-03-20                    |
+------+-------------------------------+
[9 Apr 2024 11:56] MySQL Verification Team
Hi Mr. zhang,

Thank you for your bug report.

However, this is not a bug.

The results that we get from your queries are correct and as expected:

--------------------------------------------
name	str_to_date(t0.time,'%Y%m%d')
kk	2023-03-12
ss	2023-03-05

with alias is the same ....

name	f1
kk	2023-03-12
ss	2023-03-05

totally changed query leads to different results, which is expected

name	f1
kk	2023-03-09
kk	2023-03-12
kk	2023-03-14
kk	2023-03-20
ss	2023-03-04
ss	2023-03-05
ss	2023-03-20

if you do not use derived table you also get the same results:

name	str_to_date(time,'%Y%m%d')
kk	2023-03-09
kk	2023-03-12
kk	2023-03-14
kk	2023-03-20
ss	2023-03-04
ss	2023-03-05
ss	2023-03-20

without order by the result is as expected, according to SQL standard

name	str_to_date(time,'%Y%m%d')
kk	2023-03-12
kk	2023-03-09
kk	2023-03-14
kk	2023-03-20
ss	2023-03-05
ss	2023-03-04
ss	2023-03-20

without distinct the result is the same as in the first query:

name	str_to_date(t0.time,'%Y%m%d')

kk	2023-03-09
kk	2023-03-12
kk	2023-03-14
kk	2023-03-20
ss	2023-03-04
ss	2023-03-05
ss	2023-03-20
--------------------------------------------

These are all expected results and we can not find any bug in it.

Not a bug.
[9 Apr 2024 13:27] karry zhang
In my opinion:

select distinct t0.name, str_to_date(t0.time,'%Y%m%d') from (select * from tt)t0 order by t0.name asc, str_to_date(t0.time,'%Y%m%d') asc;

the expect result shoud be:

+------+------------+
| name | f1         |
+------+------------+
| kk   | 2023-03-09 |
| kk   | 2023-03-12 |
| kk   | 2023-03-14 |
| kk   | 2023-03-20 |
| ss   | 2023-03-04 |
| ss   | 2023-03-05 |
| ss   | 2023-03-20 |
+------+------------+

but in fact the result is:
+------+-------------------------------+
| name | str_to_date(t0.time,'%Y%m%d') |
+------+-------------------------------+
| kk   | 2023-03-12                    |
| ss   | 2023-03-05                    |
+------+-------------------------------+

We got a wrong result. 

Your test also confirms this.
[9 Apr 2024 13:52] MySQL Verification Team
Hi Mr. zhang,

Thanks for the feedback.

Regarding those two queries, one with distinct and the other without distinct, you are quite correct.

This is, indeed, a bug in 8.0 and in 8.3.

Verified as reported.
[7 Aug 2024 15:19] Jon Stephens
Documented fix as follows in the MySQL 9.1.0 changelog:

    Incorrect results were sometimes obtained when using DISTINCT
    and ORDER BY with a derived table.

Closed.
[7 Aug 2024 15:28] MySQL Verification Team
Thank you, Jon.