MySQL Bugs: #118813: utf8mb4_0900_ai_ci check different string mark as duplicate

Bug #118813	utf8mb4_0900_ai_ci check different string mark as duplicate
Submitted:	12 Aug 2025 3:20	Modified:	21 Aug 2025 12:49
Reporter:	peng gao	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S3 (Non-critical)
Version:	8.0.41, 8.0.43, 8.4.6	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi team:
    
    collation when use utf8mb4_0900_ai_ci, two sting "Ⅰ" and "I"  check dup.
    when dup check，"Ⅰ" length is 3 , "I" length is 1,as follows
    
    cmp_data (mtype=12, prtype=16711695, is_asc=true, data1=0x7fff5c1fef1a "Ⅰ", len1=3, data2=0x7fffcda3807f "I", len2=1)
   
    Is this normal?  thanks

How to repeat:
create table mytest(a varchar(20),unique key(a));

insert into mytest values(0x49);
insert into mytest values(0xE285A0);
ERROR 1062 (23000): Duplicate entry 'Ⅰ' for key 'testi.a'

mysql> show create table mytest \G
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE `mytest` (
  `a` varchar(20) DEFAULT NULL,
  UNIQUE KEY `a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Hello peng gao,

Thank you for the report and feedback.

regards,
Umesh

Posted by developer:
 
The letter 'Ⅰ' is "U+2160 ROMAN NUMERAL ONE" and defined to be equal to 'I' (U+0049 LATIN CAPITAL LETTER I) in the Unicode standard.

Similarly, 'Ⅱ' ("U+2161 ROMAN NUMERAL TOW") is defined to be equal to 'II':

mysql> select 'Ⅰ' = 'I';
+-------------+
| 'Ⅰ' = 'I'   |
+-------------+
|           1 |
+-------------+
1 row in set (0.000 sec)

mysql> select 'Ⅱ' = 'II';
+--------------+
| 'Ⅱ' = 'II'   |
+--------------+
|            1 |
+--------------+
1 row in set (0.000 sec)

So this is correct behavior according to the Unicode standard, and not a bug.