Bug #2137 | server considers 'y' and 'ü' the same | ||
---|---|---|---|
Submitted: | 17 Dec 2003 2:55 | Modified: | 19 Dec 2003 4:36 |
Reporter: | Werner Stuerenburg | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: MyISAM storage engine | Severity: | S2 (Serious) |
Version: | 4.0.15, 4.1.1 | OS: | Windows (Win2K) |
Assigned to: | Alexander Barkov | CPU Architecture: | Any |
[17 Dec 2003 2:55]
Werner Stuerenburg
[18 Dec 2003 8:48]
Dean Ellis
INSERT INTO test_y VALUES ( null, x'6479' ), (null, x'64FC'); I am verifying this in the event that the latin1 character set should handle these values. Otherwise, it is a character set issue (and not a bug), as latin2 or utf8 and such do not have a problem with it.
[18 Dec 2003 11:47]
Alexander Barkov
Werner, Dean, in 4.1 the default collation for latin1 is latin1_swedish_ci, and according to Swedish rules 'y' and 'ü' are equal. If you don't like Swedish rules, you may try another collation, for example latin1_german1_ci: CREATE TABLE test_y ( id smallint(5) unsigned NOT NULL auto_increment, val varchar(5) NOT NULL default '', PRIMARY KEY (id,val), UNIQUE KEY val (val) ) TYPE=MyISAM CHARSET=latin1 COLLATE latin1_german1_ci;
[19 Dec 2003 1:43]
Werner Stuerenburg
as you could see, on my system, I'm using the character set latin1, but the customer told me, he was using a German character set, and if I'm not mistaken, it's called something like latin_de; anyway, I don't know about character sets, but I do know that 'y' is a common character contained in the 7 bit character set, whereas 'ü' is not, so I would really like to know about any character set capable of confusing these two -- should be really amazing, right? As far as I know, all 7 bit characters are identical in all character sets -- ASCII, but then, things might be much more complicated than I can imagine.
[19 Dec 2003 1:53]
Werner Stuerenburg
Alexander, thanks a lot for pointing this out: from a systematic point of view, I'd say it is not acceptable to consider two different characters as equal. I don't know much about the Swedish sorting order, but I guess Swedish people would be confused if those two characters would be mixed up in any odd order. By the way, how do you implement this "consider different things as equal"?
[19 Dec 2003 2:38]
Alexander Barkov
Werner, this is what Swedish and Finnish people do expect. Please take a look here: http://www.genealogia.fi/faq/faq013e.htm In German installation those letters are not the same. There are two German rules implemented in MySQL, according to DIN1 and DIN2. Note, character sets and collations implementation was significantly changed, and names are different in 4.0 and 4.1. They are: - german1 and latin1_de in 4.0 - latin1_german1_ci and latin1_german2_ci in 4.1. Answering to "who do we sort different characters the same": We use arrays that assign sorting weights to characters. For Swedish sorting 'y' and 'ü' just have the same weights.
[19 Dec 2003 2:52]
Alexander Barkov
Sorry for the typo, "who do we sort" -> "how do we sort".
[19 Dec 2003 4:21]
Werner Stuerenburg
In my understanding, this issue can be closed. It's not a bug, it's a feature! ;-)
[19 Dec 2003 4:36]
Alexander Barkov
I have already done it :) The status of this ticket is "Not a Bug".