Bug #41281 Problem with special chars (ß, ü) and unique index
Submitted: 6 Dec 2008 18:35 Modified: 9 Dec 2008 21:49
Reporter: Tobias F Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.1.30 (32bit) OS:Windows (Vista x64)
Assigned to: CPU Architecture:Any
Tags: INDEX, sonderzeichen, special character, umlaut, unique

[6 Dec 2008 18:35] Tobias F
Description:
Hello,
I have a table with unique index on a varchar column in mysql 5.1.30. The problem is that mysql doesn't accept a value with special characters if there is a value with the 'normal' char in the table. For example, I have the value 'munchen' in the table and try to add 'münchen' (or the other way round) then I get the following error:
#1062 - Duplicate entry 'munchen' for key 'name'

It makes no difference if I use character set utf8 or latin1 on the table or column. Typ is InnoDB.
Insert from a script over the commandline, phpmyadmin or mysql query browser doesn't matter.

If I drop the index I can insert the rows without a problem.

phpmyadmin and query browser show the values:
münchen
munchen
maß
mas

commandline shows:
mysql> select * from testname;
+----------+
| name     |
+----------+
| münchen  |
| munchen  |
| maÃY      |
| mas      |
+----------+

If i delete one row it also deletes the other.
mysql> delete from testname where name='mas';

mysql> select * from testname;
+----------+
| name     |
+----------+
| münchen  |
| munchen  |
+----------+

looks like mysql doesn't know my special char.
mysql> select * from testname where name='münchen';
Empty set (0.00 sec)

mysql> select * from testname where name='münchen';
Empty set (0.00 sec)

There is no conflict with mänchen and münchen, but with mänchen and manchen.

I did not have the problem with version 5.0.67. With this serious problem it is imposible to use mysql 5.1.30 for me because I need the possibility to insert both notations in the production system.

How to repeat:
Just crate a table 

CREATE TABLE `testname` (
  `name` VARCHAR(32)
) ENGINE=InnoDB;

insert into testname values('münchen');
insert into testname values('munchen');
insert into testname values('maß');
insert into testname values('mas');

ALTER TABLE `testname` ADD UNIQUE (
`name`
)

#1062 - Duplicate entry 'munchen' for key 'name'
[7 Dec 2008 17:28] Valeriy Kravchuk
Thank you for a problem report. Please, send the results of:

show variables like 'collation%';
[7 Dec 2008 18:28] Tobias F
With the 5.1.30 Version it shows

mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

With the 5.0.67 Version it shows the same, but the selection of 'mas' and the unique index do work.
The selection of 'maß' over the commandline doesn't work with 5.0.67 either. But they do over phpmyadmin or the query browser.

mysql> select * from testname where name='maß';
Empty set (0.00 sec)

mysql> select * from testname;
+----------+
| name     |
+----------+
| maß     |
| mas      |
+----------+
[8 Dec 2008 9:03] Susanne Ebrecht
Many thanks for writing a bug report.

Unfortunately, this is not a bug.

It is given by the Unicode Standard:
ä=a
ö=o
ü=u
ß=ss

This is not familiar in Germany. But our utf8_unicode_ci is implemented like the Unicode Standard wants it. The same with utf8_general_ci. Recommended to use for utf8 is utf8_unicode_ci.

Anyway, for German there are already tons of feature requests i.e.:
Bug #38758

Our development already works on it.

The only workaround here for you is: use latin1 and latin1_german2_ci
[9 Dec 2008 21:49] Tobias F
What is the difference between the MySQL 5.0 and 5.1 implementation of unicode? I don't get why it worked with 5.0. Or was this a 'bug' in 5.0?

Thanks for your help.