Bug #116996 | utf8mb4_0900_ai_ci not distinguish "=" and "≠" | ||
---|---|---|---|
Submitted: | 17 Dec 2024 8:44 | Modified: | 17 Dec 2024 13:44 |
Reporter: | dakun li | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[17 Dec 2024 8:44]
dakun li
[17 Dec 2024 11:04]
MySQL Verification Team
Hi Mr. li, Thank you for your bug report. We repeated your bug with 8.0.40, 8.4.3 and 9.0.2: LOAD DATA INFILE '/data/mysql3312/results.csv' INTO TABLE t900 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; ERROR 1062 (23000): Duplicate entry '15-≠' for key 't900.idx' This is now a verified bug report.
[17 Dec 2024 13:44]
Bernt Marius Johnsen
According to Unicode UCA, '≠' is to be sorted as '=' followed by U+0338 COMBINING LONG SOLIDUS OVERLAY. This means that the two characters will be equal in an accent insensitive collation. To distinguish them, you will need to use an accent sensitive collation: mysql> select '≠' = '=' collate utf8mb4_0900_ai_ci; +----------------------------------------+ | '≠' = '=' collate utf8mb4_0900_ai_ci | +----------------------------------------+ | 1 | +----------------------------------------+ 1 row in set (0.00 sec) mysql> select '≠' = '=' collate utf8mb4_0900_as_cs; +----------------------------------------+ | '≠' = '=' collate utf8mb4_0900_as_cs | +----------------------------------------+ | 0 | +----------------------------------------+ 1 row in set (0.00 sec)
[17 Dec 2024 14:55]
MySQL Verification Team
Thank you, Bernt, for the wonderful clarification.