Bug #22337 | Collation change results in duplicate key ("e" and "é" mixup) | ||
---|---|---|---|
Submitted: | 14 Sep 2006 10:28 | Modified: | 17 Oct 2006 16:25 |
Reporter: | Csongor Fagyal | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
Version: | 5.0.x, 4.1.21, 4.1.10 | OS: | Linux (Linux, Windows) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[14 Sep 2006 10:28]
Csongor Fagyal
[14 Sep 2006 10:51]
Csongor Fagyal
How to repeat: mysql> create table bar (foo char(10) not null unique) engine="MyISAM" charset=latin2 collate=latin2_hungarian_ci; Query OK, 0 rows affected (0.09 sec) mysql> insert into bar values ('zoh'), ('zuh'); Query OK, 2 rows affected (0.00 sec) mysql> insert into bar values ('zox'), ('ZŐX'); Query OK, 2 rows affected (0.00 sec) mysql> insert into bar values ('sex'), ('séx'); ERROR 1062 (23000): Duplicate entry 'séx' for key 1 mysql> insert into bar values ('söx'), ('sox'); Query OK, 2 rows affected (0.00 sec) Very interesting... looks like "o" != "ö", that is correct! But then why is "e" == "é" ???
[14 Sep 2006 11:05]
Csongor Fagyal
mispelled collation - fixed; refined synopsis
[14 Sep 2006 12:29]
Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 4.1.21, and inform about the results.
[14 Sep 2006 13:08]
Csongor Fagyal
Also tested on: mysql-standard-4.1.21-pc-linux-gnu-i686-glibc23 Fedora Core 5 Got the exact same error. Interestingly, these also produce duplicate entry errors: mysql> insert into bar values ('aabb'), ('aAbB'); ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1 I have: mysql> show create table bar; +-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bar | CREATE TABLE `bar` ( `foo` char(10) collate latin2_hungarian_ci NOT NULL default '', UNIQUE KEY `foo` (`foo`) ) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci | +-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ These work: mysql> insert into bar values ('oooo'), ('öööö'); Query OK, 2 rows affected (0.00 sec) mysql> insert into bar values ('oo'), ('ÖÖ'); Query OK, 2 rows affected (0.00 sec) mysql> insert into bar values ('auau'), ('aüaü'); Query OK, 2 rows affected (0.00 sec) Records: 2 Duplicates: 0 Warnings: 0 mysql> insert into bar values ('Uuuu'), ('Üüüü'); Query OK, 2 rows affected (0.00 sec) mysql> insert into bar values ('PÖ'), ('Po'); Query OK, 2 rows affected (0.00 sec) These do not: mysql> insert into bar values ('ee'), ('ÉÉ'); ERROR 1062 (23000): Duplicate entry 'ÉÉ' for key 1 mysql> insert into bar values ('ababa'), ('ababá'); ERROR 1062 (23000): Duplicate entry 'ababá' for key 1 mysql> insert into bar values ('almos'), ('Álmos'); ERROR 1062 (23000): Duplicate entry 'Álmos' for key 1
[14 Sep 2006 14:04]
Valeriy Kravchuk
Verified just as described, also with 5.1.24a on Windows. "set names latin2" should be executed before INSERTing data. This: mysql> insert into bar values ('aabb'), ('aAbB'); ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1 is not a bug, though, as latin2_hungarian_ci collation is used, "case-insensitive".
[15 Sep 2006 7:52]
Valeriy Kravchuk
May be also a duplicate of bug #12519 (at least, 'A' <> 'Á' is discussed there). Work in progress already.
[18 Sep 2006 5:02]
Alexander Barkov
This is not a bug. latin2_hungarian_ci is an accent insensitive collation for Hungarian, to it treats "A WITH ACUTE" and "E WITH ACUTE" equal to A and E. Changing status to feature request: accent sensitive collation for Hungarian.
[18 Sep 2006 15:01]
Csongor Fagyal
To me, a Hungarian, it feels more appropriate to have a collation that is the same as the current hungarian_ci, but named differently (something like hungarian_nonaccented_ci), and a patch to the current hungarian_ci to behave (IMHO) properly. I can submit a patch for the mappings, but don't know if it is enough to change the mappings in .../sql/share/charsets/hungarian.conf Also (as I am looking for a workaround), I have this in /usr/share/mysql/charsets/latin2.xml : <collation name="latin2_hungarian_ci"> ... </collation> Will the server recognise the changes in this file after a restart?
[11 Oct 2006 4:49]
Alexander Barkov
People need different rules for different applications. The current version is accent insensitive. It was contributed by a Hungarian many years ago, in version 3.23 or even earlier. So we won't change the name of the current collation. We can add a new collation, say latin2_hungarian2_ci. We'll really appreciate if you submit a new, accent sensitive, mapping. To make the things easier on your side, instead of addint a new collation, you can fix the current map in /usr/share/mysql/charsets/latin2.xml: <collation name="latin2_hungarian_ci"> ... </collation> Yes. The server will recognise this changes after restart. As soon as you have a correct accent sensitive map, please send it to bar@mysql.com, we'll do all other necessary modifications in .xml files to add it as a new collation. Thanks!
[17 Oct 2006 16:25]
Peter Gulutzan
The difficulty is with our Hungarian collation, as we acknowledge. I have added a lengthy comment and a request for clarification on bug#12519 "Incorrect Hungarian collation".