Bug #37555 | Collation utf8mb3_danish_ci shows wrong order in table if 'order by' clause used | ||
---|---|---|---|
Submitted: | 20 Jun 2008 19:32 | Modified: | 22 Jun 2008 15:46 |
Reporter: | Hema Sridharan | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Charsets | Severity: | S3 (Non-critical) |
Version: | mysql-6.0-backup | OS: | Linux |
Assigned to: | CPU Architecture: | Any |
[20 Jun 2008 19:32]
Hema Sridharan
[20 Jun 2008 21:27]
Peter Laursen
@hema .. this is *not a bug*. In Danish and Norwegian collations the double character sequence 'aa' is identical to the special Nordic character 'å'. The 3 special Danish characters æ,ø and å are alphabetized after z like "a,b.c ... x,y,x,æ,ø,å". That said, MySQL should support both 'traditional' and 'modern' Danish collations. The meaning of 'aa' = 'å' was officially abandoned in Danish language with the language reform of 1953 (I do not know about Norwegian), but you find the old form frequently in both geographical names (example: 'Aalborg') and person surnames (example: 'Østergaard') still. And it is *very rare* in Danish to have a 'aa' sequence NOT meaning 'å' (actually I can only think about words of Frisian/Dutch/Flemish origin) Peter not a MySQL person - and Danish!)
[20 Jun 2008 21:28]
Peter Laursen
sorry .. typo! The 3 special Danish characters æ,ø and å are alphabetized after z like "a,b.c ... x,y,z,æ,ø,å".
[20 Jun 2008 21:33]
Peter Laursen
also compare with languages/collation where ä = ae ö = oe ß = ss etc ...
[20 Jun 2008 21:54]
Peter Laursen
one last comment: this is not specific for utf8mb3 character set. You get the same with every character set supporting Danish: latin1 and any unicode character set (also utf8, utf16, utf32 and ucs2). Your result is identical to: bb cc dd ee å .. and I think that explains!
[22 Jun 2008 15:46]
Sveta Smirnova
Thank you for the report. Closed as "Not a Bug" for reasons explained by Peter. See also http://www.collation-charts.org/mysql60/mysql604.utf8_danish_ci.html But there is exception in Peter's explanation: latin1_danish_ci doesn't consider 'aa' > 'bb'
[22 Jun 2008 17:06]
Peter Laursen
ok .. Sveta is correct in her last comment set names utf8; select 'aa' > 'bb' collate utf8_general_ci; -- returns 0 select 'aa' > 'bb' collate utf8_danish_ci; -- returns 1 set names latin1; select 'aa' > 'bb' collate latin1_swedish_ci; -- returns 0 select 'aa' > 'bb' collate latin1_danish_ci; -- returns 0 .. but that is *really silly* - Danish is Danish no matter the charset! I file another bug report about this!
[22 Jun 2008 17:27]
Peter Laursen
we continue here: http://bugs.mysql.com/bug.php?id=37571 :-)