| Bug #22337 | Collation change results in duplicate key ("e" and "é" mixup) | ||
|---|---|---|---|
| Submitted: | 14 Sep 2006 10:28 | Modified: | 17 Oct 2006 16:25 |
| Reporter: | Csongor Fagyal | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S4 (Feature request) |
| Version: | 5.0.x, 4.1.21, 4.1.10 | OS: | Linux (Linux, Windows) |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[14 Sep 2006 10:28]
Csongor Fagyal
[14 Sep 2006 10:51]
Csongor Fagyal
How to repeat:
mysql> create table bar (foo char(10) not null unique) engine="MyISAM" charset=latin2 collate=latin2_hungarian_ci;
Query OK, 0 rows affected (0.09 sec)
mysql> insert into bar values ('zoh'), ('zuh');
Query OK, 2 rows affected (0.00 sec)
mysql> insert into bar values ('zox'), ('ZŐX');
Query OK, 2 rows affected (0.00 sec)
mysql> insert into bar values ('sex'), ('séx');
ERROR 1062 (23000): Duplicate entry 'séx' for key 1
mysql> insert into bar values ('söx'), ('sox');
Query OK, 2 rows affected (0.00 sec)
Very interesting... looks like "o" != "ö", that is correct! But then why is "e" == "é" ???
[14 Sep 2006 11:05]
Csongor Fagyal
mispelled collation - fixed; refined synopsis
[14 Sep 2006 12:29]
Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 4.1.21, and inform about the results.
[14 Sep 2006 13:08]
Csongor Fagyal
Also tested on: mysql-standard-4.1.21-pc-linux-gnu-i686-glibc23
Fedora Core 5
Got the exact same error.
Interestingly, these also produce duplicate entry errors:
mysql> insert into bar values ('aabb'), ('aAbB'); ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1
I have:
mysql> show create table bar;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| bar | CREATE TABLE `bar` (
`foo` char(10) collate latin2_hungarian_ci NOT NULL default '',
UNIQUE KEY `foo` (`foo`)
) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
These work:
mysql> insert into bar values ('oooo'), ('öööö');
Query OK, 2 rows affected (0.00 sec)
mysql> insert into bar values ('oo'), ('ÖÖ');
Query OK, 2 rows affected (0.00 sec)
mysql> insert into bar values ('auau'), ('aüaü');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> insert into bar values ('Uuuu'), ('Üüüü');
Query OK, 2 rows affected (0.00 sec)
mysql> insert into bar values ('PÖ'), ('Po');
Query OK, 2 rows affected (0.00 sec)
These do not:
mysql> insert into bar values ('ee'), ('ÉÉ');
ERROR 1062 (23000): Duplicate entry 'ÉÉ' for key 1
mysql> insert into bar values ('ababa'), ('ababá');
ERROR 1062 (23000): Duplicate entry 'ababá' for key 1
mysql> insert into bar values ('almos'), ('Álmos');
ERROR 1062 (23000): Duplicate entry 'Álmos' for key 1
[14 Sep 2006 14:04]
Valeriy Kravchuk
Verified just as described, also with 5.1.24a on Windows. "set names latin2" should be executed before INSERTing data.
This:
mysql> insert into bar values ('aabb'), ('aAbB');
ERROR 1062 (23000): Duplicate entry 'aAbB' for key 1
is not a bug, though, as latin2_hungarian_ci collation is used, "case-insensitive".
[15 Sep 2006 7:52]
Valeriy Kravchuk
May be also a duplicate of bug #12519 (at least, 'A' <> 'Á' is discussed there). Work in progress already.
[18 Sep 2006 5:02]
Alexander Barkov
This is not a bug. latin2_hungarian_ci is an accent insensitive collation for Hungarian, to it treats "A WITH ACUTE" and "E WITH ACUTE" equal to A and E. Changing status to feature request: accent sensitive collation for Hungarian.
[18 Sep 2006 15:01]
Csongor Fagyal
To me, a Hungarian, it feels more appropriate to have a collation that is the same as the current hungarian_ci, but named differently (something like hungarian_nonaccented_ci), and a patch to the current hungarian_ci to behave (IMHO) properly. I can submit a patch for the mappings, but don't know if it is enough to change the mappings in .../sql/share/charsets/hungarian.conf Also (as I am looking for a workaround), I have this in /usr/share/mysql/charsets/latin2.xml : <collation name="latin2_hungarian_ci"> ... </collation> Will the server recognise the changes in this file after a restart?
[11 Oct 2006 4:49]
Alexander Barkov
People need different rules for different applications. The current version is accent insensitive. It was contributed by a Hungarian many years ago, in version 3.23 or even earlier. So we won't change the name of the current collation. We can add a new collation, say latin2_hungarian2_ci. We'll really appreciate if you submit a new, accent sensitive, mapping. To make the things easier on your side, instead of addint a new collation, you can fix the current map in /usr/share/mysql/charsets/latin2.xml: <collation name="latin2_hungarian_ci"> ... </collation> Yes. The server will recognise this changes after restart. As soon as you have a correct accent sensitive map, please send it to bar@mysql.com, we'll do all other necessary modifications in .xml files to add it as a new collation. Thanks!
[17 Oct 2006 16:25]
Peter Gulutzan
The difficulty is with our Hungarian collation, as we acknowledge. I have added a lengthy comment and a request for clarification on bug#12519 "Incorrect Hungarian collation".
