Bug #116996 | utf8mb4_0900_ai_ci not distinguish "=" and "≠" | ||
---|---|---|---|
Submitted: | 17 Dec 8:44 | Modified: | 17 Dec 13:44 |
Reporter: | dakun li | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[17 Dec 8:44]
dakun li
[17 Dec 11:04]
MySQL Verification Team
Hi Mr. li, Thank you for your bug report. We repeated your bug with 8.0.40, 8.4.3 and 9.0.2: LOAD DATA INFILE '/data/mysql3312/results.csv' INTO TABLE t900 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'; ERROR 1062 (23000): Duplicate entry '15-≠' for key 't900.idx' This is now a verified bug report.
[17 Dec 13:44]
Bernt Marius Johnsen
According to Unicode UCA, '≠' is to be sorted as '=' followed by U+0338 COMBINING LONG SOLIDUS OVERLAY. This means that the two characters will be equal in an accent insensitive collation. To distinguish them, you will need to use an accent sensitive collation: mysql> select '≠' = '=' collate utf8mb4_0900_ai_ci; +----------------------------------------+ | '≠' = '=' collate utf8mb4_0900_ai_ci | +----------------------------------------+ | 1 | +----------------------------------------+ 1 row in set (0.00 sec) mysql> select '≠' = '=' collate utf8mb4_0900_as_cs; +----------------------------------------+ | '≠' = '=' collate utf8mb4_0900_as_cs | +----------------------------------------+ | 0 | +----------------------------------------+ 1 row in set (0.00 sec)
[17 Dec 14:55]
MySQL Verification Team
Thank you, Bernt, for the wonderful clarification.