Bug #116996 utf8mb4_0900_ai_ci not distinguish "=" and "≠"
Submitted: 17 Dec 8:44 Modified: 17 Dec 13:44
Reporter: dakun li Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[17 Dec 8:44] dakun li
Description:
utf8mb4_0900_ai_ci  can not distinguish this two char: "=" and "≠"

How to repeat:
first : create two table as follow,the collation is not the same

CREATE TABLE `t900` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `presets_category_id` int NOT NULL DEFAULT '0',
  `code` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx` (`presets_category_id`,`code`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

 CREATE TABLE `tgene` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `presets_category_id` int NOT NULL DEFAULT '0',
  `code` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx` (`presets_category_id`,`code`)
) ENGINE=InnoDB AUTO_INCREMENT=7224 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

second:
we load the data:
[root@VM-24-2-centos mysql3312]# cat results.csv
"52","15","="
"7223","15","≠"

[root@localhost][lockli]> LOAD DATA INFILE '/data/mysql3312/results.csv' INTO TABLE tgene FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Query OK, 2 rows affected (0.01 sec)
Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

[root@localhost][lockli]> LOAD DATA INFILE '/data/mysql3312/results.csv' INTO TABLE t900 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';  
ERROR 1062 (23000): Duplicate entry '15-≠' for key 't900.idx'
[17 Dec 11:04] MySQL Verification Team
Hi Mr. li,

Thank you for your bug report.

We repeated your bug with 8.0.40, 8.4.3 and 9.0.2:

LOAD DATA INFILE '/data/mysql3312/results.csv' INTO TABLE t900 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
ERROR 1062 (23000): Duplicate entry '15-≠' for key 't900.idx'

This is now a verified bug report.
[17 Dec 13:44] Bernt Marius Johnsen
According to Unicode UCA, '≠' is to be sorted as '=' followed by U+0338 COMBINING LONG SOLIDUS OVERLAY. This means that the two characters will be equal in an accent insensitive collation. To distinguish them, you will need to use an accent sensitive collation:

mysql> select '≠' = '=' collate utf8mb4_0900_ai_ci;
+----------------------------------------+
| '≠' = '=' collate utf8mb4_0900_ai_ci   |
+----------------------------------------+
|                                      1 |
+----------------------------------------+
1 row in set (0.00 sec)

mysql> select '≠' = '=' collate utf8mb4_0900_as_cs;
+----------------------------------------+
| '≠' = '=' collate utf8mb4_0900_as_cs   |
+----------------------------------------+
|                                      0 |
+----------------------------------------+
1 row in set (0.00 sec)
[17 Dec 14:55] MySQL Verification Team
Thank you, Bernt, for the wonderful clarification.