MySQL Bugs: #22337: Collation change results in duplicate key ("e" and "é" mixup)

Bug #22337	Collation change results in duplicate key ("e" and "é" mixup)
Submitted:	14 Sep 2006 10:28	Modified:	17 Oct 2006 16:25
Reporter:	Csongor Fagyal	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S4 (Feature request)
Version:	5.0.x, 4.1.21, 4.1.10	OS:	Linux (Linux, Windows)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I have a column defined as:

`foobar` varchar(40) collate latin2_bin NOT NULL default ''

As I want to use case-insensitive LIKE searches, I want to use latin2_hungarian_ci collation, so I wanted to change the collation:

alter table tevefarm modify column `foobar` varchar(40) collate latin2_hungarian_ci NOT NULL default '';

Which results in:

ERROR 1062 (23000): Duplicate entry 'Szuszék' for key 2

The same happens if I try to change the collation for the table:

alter table mytable CONVERT TO CHARACTER SET latin2 collate latin2_hungarian_ci;

I have checked the table, and the duplicate entry error message comes from having the following two values in mytable.foobar:

Szuszek
Szuszék

These are not identical values, they should not be considered duplicates, as "e" != "é". I don't think that the collation should have an effect here. I really don't like the idea that, for example, I cannot insert "cipofuzo" and "cipőfűző" into a unique char column.

How to repeat:
See above.

How to repeat:

mysql> create table bar (foo char(10) not null unique) engine="MyISAM" charset=latin2 collate=latin2_hungarian_ci;

Query OK, 0 rows affected (0.09 sec)

mysql> insert into bar values ('zoh'), ('zuh');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('zox'), ('ZŐX');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('sex'), ('séx');
ERROR 1062 (23000): Duplicate entry 'séx' for key 1

mysql> insert into bar values ('söx'), ('sox');
Query OK, 2 rows affected (0.00 sec)

Very interesting... looks like "o" != "ö", that is correct! But then why is "e" == "é" ???

mispelled collation - fixed; refined synopsis

Thank you for a problem report. Please, try to repeat with a newer version, 4.1.21, and inform about the results.

Also tested on: mysql-standard-4.1.21-pc-linux-gnu-i686-glibc23

Fedora Core 5

Got the exact same error.

Interestingly, these also produce duplicate entry errors:
mysql> insert into bar values ('aabb'), ('aAbB'); ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1

I have:
mysql> show create table bar;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                   |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar   | CREATE TABLE `bar` (
  `foo` char(10) collate latin2_hungarian_ci NOT NULL default '',
  UNIQUE KEY `foo` (`foo`)
) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

These work:
mysql> insert into bar values ('oooo'), ('öööö');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('oo'), ('ÖÖ');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('auau'), ('aüaü');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into bar values ('Uuuu'), ('Üüüü');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('PÖ'), ('Po');
Query OK, 2 rows affected (0.00 sec)

These do not:
mysql> insert into bar values ('ee'), ('ÉÉ');
ERROR 1062 (23000): Duplicate entry 'ÉÉ' for key 1

mysql> insert into bar values ('ababa'), ('ababá');
ERROR 1062 (23000): Duplicate entry 'ababá' for key 1

mysql> insert into bar values ('almos'), ('Álmos');
ERROR 1062 (23000): Duplicate entry 'Álmos' for key 1

Verified just as described, also with 5.1.24a on Windows. "set names latin2" should be executed before INSERTing data.

This:

mysql> insert into bar values ('aabb'), ('aAbB'); 
ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1

is not a bug, though, as latin2_hungarian_ci collation is used, "case-insensitive".

May be also a duplicate of bug #12519 (at least, 'A' <> 'Á' is discussed there). Work in progress already.

This is not a bug. latin2_hungarian_ci is an accent insensitive
collation for Hungarian, to it treats "A WITH ACUTE" and "E WITH ACUTE"
equal to A and E.

Changing status to feature request: accent sensitive collation for Hungarian.

To me, a Hungarian, it feels more appropriate to have a collation that is the same as the current hungarian_ci, but named differently (something like hungarian_nonaccented_ci), and a patch to the current hungarian_ci to behave (IMHO) properly. 

I can submit a patch for the mappings, but don't know if it is enough to change the mappings in
.../sql/share/charsets/hungarian.conf

Also (as I am looking for a workaround), I have this in /usr/share/mysql/charsets/latin2.xml :

<collation name="latin2_hungarian_ci">
...
</collation>

Will the server recognise the changes in this file after a restart?

People need different rules for different applications.
The current version is accent insensitive. It was contributed
by a Hungarian many years ago, in version 3.23 or even earlier.
So we won't change the name of the current collation.
We can add a new collation, say latin2_hungarian2_ci.
We'll really appreciate if you submit a new, accent sensitive, mapping.

To make the things easier on your side, 
instead of addint a new collation,
you can fix the current map in /usr/share/mysql/charsets/latin2.xml:

<collation name="latin2_hungarian_ci">
...
</collation>

Yes. The server will recognise this changes after restart.

As soon as you have a correct accent sensitive map,
please send it to bar@mysql.com, we'll do all other
necessary modifications in .xml files to add it as
a new collation.

Thanks!

The difficulty is with our Hungarian collation, as we acknowledge.
I have added a lengthy comment and a request for clarification on
bug#12519 "Incorrect Hungarian collation".