Bug #40900 Unique index on accented characters in utf8
Submitted: 20 Nov 2008 19:16 Modified: 22 Nov 2008 9:57
Reporter: Dilip Godhia Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.0.54 Ent OS:Linux
Assigned to: CPU Architecture:Any

[20 Nov 2008 19:16] Dilip Godhia
Description:
Database is set to be UTF8 and tables are also UTF8. A unique index is created on a varchar column. When a value is inserted into that column using accented and non accented characters, the unique index treats the values to be the same.

How to repeat:
set names utf8;
set character set utf8;

drop table if exists utf8_test;
create table utf8_test (
  id int(20) unsigned not null auto_increment,
  name varchar(255) default null,
  primary key  (id),
  unique key idx_name (name) 
) engine=innodb default charset=utf8;

insert into utf8_test (name) values ('après-rasage');
insert into utf8_test (name) values ('apres-rasage');

Results:
On the second insert it throws an error -

Error Code : 1062
Duplicate entry 'apres-rasage' for key 2
[20 Nov 2008 19:25] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of:

show variables like 'colla%';

from the same environment.
[21 Nov 2008 17:07] Dilip Godhia
mysql> show variables like 'colla%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
[22 Nov 2008 9:57] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Most likely utf8_general_ci  is not suitable collation for your data. Please read at http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html about how different utf8 collations work.