Bug #101311 | utf8mb4_unicode_ci issue | ||
---|---|---|---|
Submitted: | 26 Oct 2020 6:15 | Modified: | 27 Oct 2020 13:39 |
Reporter: | kim kyoung youn | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 8.0.22 | OS: | Any |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | #collation #characterset |
[26 Oct 2020 6:15]
kim kyoung youn
[26 Oct 2020 12:36]
MySQL Verification Team
Hi Mr. youn, You are using a product that we do not make nor do we support it. Can you please try your test case with our server.
[26 Oct 2020 14:06]
MySQL Verification Team
Hi, Thanks for the report. While you reported the bug using wrong RDBMS, we too have this issue. all best Bogdan
[26 Oct 2020 19:17]
Perlover Perlovovich
I found strange collate problem after upgrade mysql-cluster 8.0.21 -> 8.0.22 (ubuntu 18.04) The priblem was described here: https://github.com/sidorares/node-mysql2/pull/1235 I cannot reproduce through mysql cli but i can reproduce through mysql2 node package through API May be this problem was related with this bug
[26 Oct 2020 19:23]
Perlover Perlovovich
If to be brief i do through mysql2 API a following statements: DROP TABLE IF EXISTS asciitest CREATE TABLE IF NOT EXISTS asciitest (a char (64) ascii not null, b int unsigned not null) character set utf8; INSERT INTO asciitest (a, b) VALUES ('test', 1); INSERT INTO asciitest (a, b) VALUES ('super', 2); SELECT a FROM ${table} WHERE a='test' AND b=2; And i got an error: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8mb4_unicode_ci,COERCIBLE) for operation '=' It should not happen because ascii field should be coercion to unicode characters before a matching through repertoire mechanism i think (https://dev.mysql.com/doc/refman/8.0/en/charset-repertoire.html)
[27 Oct 2020 13:07]
Bernt Marius Johnsen
char(0) is an "ignorable" and should not be taken into consideration when two strings are compared. MySQL> select char(0) = '' collate utf8mb4_general_ci; +-----------------------------------------+ | char(0) = '' collate utf8mb4_general_ci | +-----------------------------------------+ | 0 | +-----------------------------------------+ 1 row in set (0.0009 sec) MySQL> select concat(char(0),'a') = 'a' collate utf8mb4_unicode_ci; +------------------------------------------------------+ | concat(char(0),'a') = 'a' collate utf8mb4_unicode_ci | +------------------------------------------------------+ | 1 | +------------------------------------------------------+ 1 row in set (0.0009 sec) MySQL> select char(0) = '' collate utf8mb4_0900_ai_ci; +-----------------------------------------+ | char(0) = '' collate utf8mb4_0900_ai_ci | +-----------------------------------------+ | 1 | +-----------------------------------------+ 1 row in set (0.0008 sec) So this is a bug in utf8mb4_general_ci. utf8mb4_unicode_ci behaves correctly (and so does utf8mb4_0900_ai_ci). utf8mb4_unicode_ci was implemented as a more correct alternative to utf8mb4_general_ci which has several flaws. But it does not correct all the flaws, so we recommend users to use utf8mb4_0900_ai_ci which is believed to be quite correct.
[27 Oct 2020 13:39]
Norvald Ryeng
Hi, Thanks for the report! As mentioned by others, this is a bug in utf8mb4_general_ci. Unfortunately, we can't fix bugs in collations. The reason is that indexes stored on disk are sorted according to collations, so any change we make to a collation will invalidate all indexes using that collation. If we discover a bug in a collation, the only viable solution is to create a new collation that corrects that bug, and then users have to change the collation of their columns and indexes to the new one, if the bug affects them. In this case, the correct and bug free collation already exists: utf8mb4_0900_ai_ci. Therefore, I'm closing this as "won't fix". The utf8mb4_general_ci collation is known to have bugs, but we can't really do anything about it except ask people not to use it. In general, we recommend all users to use utf8mb4_0900_ai_ci or another utf8mb4_0900_* collation.