Bug #111057 group by or distinct clause result doesn't match collation expectation
Submitted: 17 May 2023 12:23 Modified: 18 May 2023 12:35
Reporter: wei lee Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: COllation bug, space

[17 May 2023 12:23] wei lee
Description:
It is mentioned in the "The binary Collation Compared to _bin Collations"  documentation

"For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.

NO PAD collations treat trailing spaces as significant in comparisons, like any other character." 

but in mysql 8.0.XX ,all group by / distinct clause treated strings with trailing space as "NO PAD" mode

e.g.

show collation;

| utf8mb4_bin                 | utf8mb4  |  46 |         | Yes      |       1 | PAD SPACE     |
| utf8mb4_bs_0900_ai_ci       | utf8mb4  | 316 |         | Yes      |       0 | NO PAD        |

set collation_connection=utf8mb4_bin;

select ''=' ';
+--------+
| ''=' ' |
+--------+
|      1 | --return true 
+--------+

select count(*) from (select distinct  a from (select '' a union all select ' ' ) b) c;
+----------+
| count(*) |
+----------+
|        2 | --there is wrong result,sholud be 1
+----------+
1 row in set (0.00 sec)

set collation_connection=utf8mb4_0900_ai_ci;

select ''=' ';
+--------+
| ''=' ' |
+--------+
|      0 | --retrun false
+--------+
1 row in set (0.01 sec)

 select count(*) from (select distinct  a from (select '' a union all select ' ' ) b) c;
+----------+
| count(*) |
+----------+
|        2 | --this is right result
+----------+
1 row in set (0.00 sec)

but in 5.6.xx,collation is work as expectation

e.g.
set collation_connection=utf8mb4_bin;
select ''=' ';
+--------+
| ''=' ' |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)

select count(*) from (select distinct  a from (select '' a union all select ' ' ) b) c;
+----------+
| count(*) |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

when use binary clause ,result also meet expectation:

select count(*) from (select distinct binary  a from (select '' a union all select ' ' ) b) c;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

How to repeat:
as Description

Suggested fix:
when use "PAD SPACE"  collation, work as documentation description:
[17 May 2023 12:34] MySQL Verification Team
Hi Mr. lee,

Thank you for your bug report.

However, we are not able to repeat your results with 8.0.33.

This is what we get:

+--------+
| ''=' ' |
+--------+
|      1 |
+--------+
+----------+
| count(*) |
+----------+
|        1 |
+----------+
+--------+
| ''=' ' |
+--------+
|      0 |
+--------+
+----------+
| count(*) |
+----------+
|        2 |
+----------+

Can't repeat.
[17 May 2023 12:52] wei lee
hi,@MySQL Verification Team

Just like you said,8.0.33 can't repeat this issue,looks like it's fixed

i found this issue on 8.0.30 or lower version
[17 May 2023 12:55] MySQL Verification Team
Hi,

Versions are 5.7, 8.0, 81 ....

What you are writing about are patch fixing releases.

Hence, we do not test on old releases, because the only relevant release is the last one.
[17 May 2023 12:59] wei lee
OK,i see, thx again
[18 May 2023 5:50] wei lee
confirm this issue fixed by 8.0.27
:

Implicitly grouped queries sometimes calculate aggregates during optimization when their values can be easily retrieved from indexes. When a predicate referenced a column that was declared with a NO PAD collation, that predicate might be evaluated using PAD SPACE semantics, and so return wrong results. This was because an internal function that checked for insignificant trailing spaces made the assumption that all nonbinary collations had PAD SPACE semantics, which was true of MySQL 5.7, but is not the case for MySQL 8.0, which has added many collations having NO PAD semantics, including the default collation (utf8mb4_0900_ai_ci).

case closed
[18 May 2023 12:35] MySQL Verification Team
Hi,

That is true .....