Bug #97915 utf8mb4_unicode_ci not converting letter "i" appropriate
Submitted: 6 Dec 2019 12:55 Modified: 7 Jan 2020 15:04
Reporter: Adam Nielsen Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.7.27 OS:Ubuntu
Assigned to: CPU Architecture:x86
Tags: collation

[6 Dec 2019 12:55] Adam Nielsen
Description:
Szenario: 

Given a table `users` with column`name` that has as collation  set. And given a row where name is equal to "Yılmaz"

The query

    select id from users where name='Yilmaz';

won't find a row with name "Yılmaz" if collation of field name is set to `utf8mb4_unicode_ci`.

However it finds the row with name "Yılmaz" if collation of field name is set to `utf8mb4_general_ci`.

How to repeat:
Create a table users with character set `utf8mb4` and a a column `name` with collation `utf8mb4_unicode_ci` . Insert row with name "Yılmaz". Execute query

    select id from users where name='Yilmaz';

and find nothing.

Suggested fix:
I would suggest that collation utf8mb4_unicode_ci should match "i" with "ı" as it does with other umlautes, like "ö" and "o".

Or in other words,

    select id from users where name='Yilmaz';

should return rows with name "Yılmaz"
[6 Dec 2019 12:57] Adam Nielsen
Sorry the first part 

"Szenario: 

Given a table `users` with column`name` that has as collation  set. And given a row where name is equal to "Yılmaz"" 

should have been removed. Sorry for the noise.
[6 Dec 2019 14:46] MySQL Verification Team
Hi Mr. Nielsen,

Thank you for your bug report.

We should point out that utfmb4 character sets and collations were introduced first time into 5.7, but its full implementation has been achieved in mysql-8.0.

Can you test your test case versus 8.0.18 and let us know whether you still have problems. Do note that in 8.0 you have several choices for this collation. Here it is from the Reference Manual:

utf8mb4_0900_ai_ci is based on UCA 9.0.0 weight keys (http://www.unicode.org/Public/UCA/9.0.0/allkeys.txt).

utf8mb4_unicode_520_ci is based on UCA 5.2.0 weight keys (http://www.unicode.org/Public/UCA/5.2.0/allkeys.txt).

utf8mb4_unicode_ci (with no version named) is based on UCA 4.0.0 weight keys (http://www.unicode.org/Public/UCA/4.0.0/allkeys-4.0.0.txt).

As you can see the one that you are testing is based on a very old standard.

Let us know your experiences with 8.0 and other unicode*ci collations ......
[7 Jan 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".