Bug #39816 | German collation under utf8_unicode_ci is incorrect | ||
---|---|---|---|
Submitted: | 2 Oct 2008 16:06 | Modified: | 6 Oct 2008 11:40 |
Reporter: | Jay Pipes | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | 5.0.51 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | character sets, collation |
[2 Oct 2008 16:06]
Jay Pipes
[6 Oct 2008 11:28]
Susanne Ebrecht
Not a bug in 5.1: Explaination: German 1: ä = a, ß = s German 2: ä = ae, ß = ss utf8_unicode_ci and latin1_german1_ci are sorting the same way for this example utf8_general_ci and latin1_german2_ci are sorting the same way for this example Here my test: Terminal: UTF8 SET NAMES UTF8; create table t_utf8(v varchar(100))default charset=utf8; create table t_latin1(v varchar(100))default charset=latin1; insert into t_utf8 VALUES('Arg'),('Ärgerlich'),('Arm'),('Assistant'),('Aßlar'),('Assoziation'); insert into t_latin1 VALUES('Arg'),('Ärgerlich'),('Arm'),('Assistant'),('Aßlar'),('Assoziation'); Expected German1: Arg, Ärgerlich, Arm, Aßlar, Assistant, Assoziation Expected German2: Ärgerlich, Arg, Arm, Assistant, Aßlar, Assoziation select * from t_utf8 order by v COLLATE utf8_unicode_ci; +-------------+ | v | +-------------+ | Arg | | Ärgerlich | | Arm | | Assistant | | Aßlar | | Assoziation | +-------------+ select * from t_utf8 order by v COLLATE utf8_general_ci; +-------------+ | v | +-------------+ | Arg | | Ärgerlich | | Arm | | Aßlar | | Assistant | | Assoziation | +-------------+ select * from t_latin1 order by v COLLATE latin1_german1_ci; +-------------+ | v | +-------------+ | Arg | | Ärgerlich | | Arm | | Aßlar | | Assistant | | Assoziation | +-------------+ select * from t_latin1 order by v COLLATE latin1_german2_ci; +-------------+ | v | +-------------+ | Ärgerlich | | Arg | | Arm | | Assistant | | Aßlar | | Assoziation | +-------------+ So nothing to fix for MySQL 5.1. I have to test 5.0.
[6 Oct 2008 11:38]
Susanne Ebrecht
Sorry, I was blind. Collation utf8_general_ci is sorting the ß like German2 but Ä like German1. Anyway it is not recommended in Germany to use utf8_general_ci. It is recommended to use utf8_unicode_ci. This behaviour already is described here: http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html Curious but true. The collation utf8_unicode_ci is using ß = ss from German 2 rule (DIN 5007-2) and ä=a, ö=o, ü=u from German 1 rule (DIN 5007-1). So the feature request is to change ä,ö,ü in utf8_unicode_ci to German 2 rules too or make an own utf8_german2_ci. I think there is already such a feature request.
[6 Oct 2008 11:40]
Susanne Ebrecht
This is a duplicate of bug #38758
[6 Oct 2008 12:18]
Susanne Ebrecht
This mad mix of German rules also is given at the Unicode description when you don't have own collations for the language.