Bug #22337 Collation change results in duplicate key ("e" and "é" mixup)
Submitted: 14 Sep 2006 10:28 Modified: 17 Oct 2006 16:25
Reporter: Csongor Fagyal Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0.x, 4.1.21, 4.1.10 OS:Linux (Linux, Windows)
Assigned to: Assigned Account CPU Architecture:Any
Triage: D5 (Feature request)

[14 Sep 2006 10:28] Csongor Fagyal
Description:
I have a column defined as:

`foobar` varchar(40) collate latin2_bin NOT NULL default ''

As I want to use case-insensitive LIKE searches, I want to use latin2_hungarian_ci collation, so I wanted to change the collation:

alter table tevefarm modify column `foobar` varchar(40) collate latin2_hungarian_ci NOT NULL default '';

Which results in:

ERROR 1062 (23000): Duplicate entry 'Szuszék' for key 2

The same happens if I try to change the collation for the table:

alter table mytable CONVERT TO CHARACTER SET latin2 collate latin2_hungarian_ci;

I have checked the table, and the duplicate entry error message comes from having the following two values in mytable.foobar:

Szuszek
Szuszék

These are not identical values, they should not be considered duplicates, as "e" != "é". I don't think that the collation should have an effect here. I really don't like the idea that, for example, I cannot insert "cipofuzo" and "cipőfűző" into a unique char column.

How to repeat:
See above.
[14 Sep 2006 10:51] Csongor Fagyal
How to repeat:

mysql> create table bar (foo char(10) not null unique) engine="MyISAM" charset=latin2 collate=latin2_hungarian_ci;

Query OK, 0 rows affected (0.09 sec)

mysql> insert into bar values ('zoh'), ('zuh');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('zox'), ('ZŐX');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('sex'), ('séx');
ERROR 1062 (23000): Duplicate entry 'séx' for key 1

mysql> insert into bar values ('söx'), ('sox');
Query OK, 2 rows affected (0.00 sec)

Very interesting... looks like "o" != "ö", that is correct! But then why is "e" == "é" ???
[14 Sep 2006 11:05] Csongor Fagyal
mispelled collation - fixed; refined synopsis
[14 Sep 2006 12:29] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 4.1.21, and inform about the results.
[14 Sep 2006 13:08] Csongor Fagyal
Also tested on: mysql-standard-4.1.21-pc-linux-gnu-i686-glibc23

Fedora Core 5

Got the exact same error.

Interestingly, these also produce duplicate entry errors:
mysql> insert into bar values ('aabb'), ('aAbB'); ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1

I have:
mysql> show create table bar;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                   |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar   | CREATE TABLE `bar` (
  `foo` char(10) collate latin2_hungarian_ci NOT NULL default '',
  UNIQUE KEY `foo` (`foo`)
) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

These work:
mysql> insert into bar values ('oooo'), ('öööö');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('oo'), ('ÖÖ');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('auau'), ('aüaü');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into bar values ('Uuuu'), ('Üüüü');
Query OK, 2 rows affected (0.00 sec)

mysql> insert into bar values ('PÖ'), ('Po');
Query OK, 2 rows affected (0.00 sec)

These do not:
mysql> insert into bar values ('ee'), ('ÉÉ');
ERROR 1062 (23000): Duplicate entry 'ÉÉ' for key 1

mysql> insert into bar values ('ababa'), ('ababá');
ERROR 1062 (23000): Duplicate entry 'ababá' for key 1

mysql> insert into bar values ('almos'), ('Álmos');
ERROR 1062 (23000): Duplicate entry 'Álmos' for key 1
[14 Sep 2006 14:04] Valeriy Kravchuk
Verified just as described, also with 5.1.24a on Windows. "set names latin2" should be executed before INSERTing data.

This:

mysql> insert into bar values ('aabb'), ('aAbB'); 
ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1

is not a bug, though, as latin2_hungarian_ci collation is used, "case-insensitive".
[15 Sep 2006 7:52] Valeriy Kravchuk
May be also a duplicate of bug #12519 (at least, 'A' <> 'Á' is discussed there). Work in progress already.
[18 Sep 2006 5:02] Alexander Barkov
This is not a bug. latin2_hungarian_ci is an accent insensitive
collation for Hungarian, to it treats "A WITH ACUTE" and "E WITH ACUTE"
equal to A and E.

Changing status to feature request: accent sensitive collation for Hungarian.
[18 Sep 2006 15:01] Csongor Fagyal
To me, a Hungarian, it feels more appropriate to have a collation that is the same as the current hungarian_ci, but named differently (something like hungarian_nonaccented_ci), and a patch to the current hungarian_ci to behave (IMHO) properly. 

I can submit a patch for the mappings, but don't know if it is enough to change the mappings in
.../sql/share/charsets/hungarian.conf

Also (as I am looking for a workaround), I have this in /usr/share/mysql/charsets/latin2.xml :

<collation name="latin2_hungarian_ci">
...
</collation>

Will the server recognise the changes in this file after a restart?
[11 Oct 2006 4:49] Alexander Barkov
People need different rules for different applications.
The current version is accent insensitive. It was contributed
by a Hungarian many years ago, in version 3.23 or even earlier.
So we won't change the name of the current collation.
We can add a new collation, say latin2_hungarian2_ci.
We'll really appreciate if you submit a new, accent sensitive, mapping.

To make the things easier on your side, 
instead of addint a new collation,
you can fix the current map in /usr/share/mysql/charsets/latin2.xml:

<collation name="latin2_hungarian_ci">
...
</collation>

Yes. The server will recognise this changes after restart.

As soon as you have a correct accent sensitive map,
please send it to bar@mysql.com, we'll do all other
necessary modifications in .xml files to add it as
a new collation.

Thanks!
[17 Oct 2006 16:25] Peter Gulutzan
The difficulty is with our Hungarian collation, as we acknowledge.
I have added a lengthy comment and a request for clarification on
bug#12519 "Incorrect Hungarian collation".