| Bug #39816 | German collation under utf8_unicode_ci is incorrect | ||
|---|---|---|---|
| Submitted: | 2 Oct 2008 16:06 | Modified: | 6 Oct 2008 11:40 |
| Reporter: | Jay Pipes | Email Updates: | |
| Status: | Duplicate | Impact on me: | |
| Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
| Version: | 5.0.51 | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | character sets, collation | ||
[2 Oct 2008 16:06]
Jay Pipes
[6 Oct 2008 11:28]
Susanne Ebrecht
Not a bug in 5.1:
Explaination:
German 1: ä = a, ß = s
German 2: ä = ae, ß = ss
utf8_unicode_ci and latin1_german1_ci are sorting the same way for this example
utf8_general_ci and latin1_german2_ci are sorting the same way for this example
Here my test:
Terminal: UTF8
SET NAMES UTF8;
create table t_utf8(v varchar(100))default charset=utf8;
create table t_latin1(v varchar(100))default charset=latin1;
insert into t_utf8 VALUES('Arg'),('Ärgerlich'),('Arm'),('Assistant'),('Aßlar'),('Assoziation');
insert into t_latin1 VALUES('Arg'),('Ärgerlich'),('Arm'),('Assistant'),('Aßlar'),('Assoziation');
Expected German1: Arg, Ärgerlich, Arm, Aßlar, Assistant, Assoziation
Expected German2: Ärgerlich, Arg, Arm, Assistant, Aßlar, Assoziation
select * from t_utf8 order by v COLLATE utf8_unicode_ci;
+-------------+
| v |
+-------------+
| Arg |
| Ärgerlich |
| Arm |
| Assistant |
| Aßlar |
| Assoziation |
+-------------+
select * from t_utf8 order by v COLLATE utf8_general_ci;
+-------------+
| v |
+-------------+
| Arg |
| Ärgerlich |
| Arm |
| Aßlar |
| Assistant |
| Assoziation |
+-------------+
select * from t_latin1 order by v COLLATE latin1_german1_ci;
+-------------+
| v |
+-------------+
| Arg |
| Ärgerlich |
| Arm |
| Aßlar |
| Assistant |
| Assoziation |
+-------------+
select * from t_latin1 order by v COLLATE latin1_german2_ci;
+-------------+
| v |
+-------------+
| Ärgerlich |
| Arg |
| Arm |
| Assistant |
| Aßlar |
| Assoziation |
+-------------+
So nothing to fix for MySQL 5.1. I have to test 5.0.
[6 Oct 2008 11:38]
Susanne Ebrecht
Sorry, I was blind. Collation utf8_general_ci is sorting the ß like German2 but Ä like German1. Anyway it is not recommended in Germany to use utf8_general_ci. It is recommended to use utf8_unicode_ci. This behaviour already is described here: http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html Curious but true. The collation utf8_unicode_ci is using ß = ss from German 2 rule (DIN 5007-2) and ä=a, ö=o, ü=u from German 1 rule (DIN 5007-1). So the feature request is to change ä,ö,ü in utf8_unicode_ci to German 2 rules too or make an own utf8_german2_ci. I think there is already such a feature request.
[6 Oct 2008 11:40]
Susanne Ebrecht
This is a duplicate of bug #38758
[6 Oct 2008 12:18]
Susanne Ebrecht
This mad mix of German rules also is given at the Unicode description when you don't have own collations for the language.
