| Bug #118813 | utf8mb4_0900_ai_ci check different string mark as duplicate | ||
|---|---|---|---|
| Submitted: | 12 Aug 3:20 | Modified: | 21 Aug 12:49 |
| Reporter: | peng gao | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: DML | Severity: | S3 (Non-critical) |
| Version: | 8.0.41, 8.0.43, 8.4.6 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[12 Aug 3:22]
peng gao
mysql> show create table mytest \G
*************************** 1. row ***************************
Table: mytest
Create Table: CREATE TABLE `mytest` (
`a` varchar(20) DEFAULT NULL,
UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
[12 Aug 5:00]
MySQL Verification Team
Hello peng gao, Thank you for the report and feedback. regards, Umesh
[21 Aug 12:49]
Bernt Marius Johnsen
Posted by developer:
The letter 'Ⅰ' is "U+2160 ROMAN NUMERAL ONE" and defined to be equal to 'I' (U+0049 LATIN CAPITAL LETTER I) in the Unicode standard.
Similarly, 'Ⅱ' ("U+2161 ROMAN NUMERAL TOW") is defined to be equal to 'II':
mysql> select 'Ⅰ' = 'I';
+-------------+
| 'Ⅰ' = 'I' |
+-------------+
| 1 |
+-------------+
1 row in set (0.000 sec)
mysql> select 'Ⅱ' = 'II';
+--------------+
| 'Ⅱ' = 'II' |
+--------------+
| 1 |
+--------------+
1 row in set (0.000 sec)
So this is correct behavior according to the Unicode standard, and not a bug.

Description: Hi team: collation when use utf8mb4_0900_ai_ci, two sting "Ⅰ" and "I" check dup. when dup check,"Ⅰ" length is 3 , "I" length is 1,as follows cmp_data (mtype=12, prtype=16711695, is_asc=true, data1=0x7fff5c1fef1a "Ⅰ", len1=3, data2=0x7fffcda3807f "I", len2=1) Is this normal? thanks How to repeat: create table mytest(a varchar(20),unique key(a)); insert into mytest values(0x49); insert into mytest values(0xE285A0); ERROR 1062 (23000): Duplicate entry 'Ⅰ' for key 'testi.a'