Bug #37555 Collation utf8mb3_danish_ci shows wrong order in table if 'order by' clause used
Submitted: 20 Jun 2008 19:32 Modified: 22 Jun 2008 15:46
Reporter: Hema Sridharan Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:mysql-6.0-backup OS:Linux
Assigned to: CPU Architecture:Any

[20 Jun 2008 19:32] Hema Sridharan
Description:
1) I create database with utf8mb3 character set and collate utf8mb3_danish_ci.
2) I create tables with the same character set and collation.
3) I insert some values in to the table.
4) I perform select * from <table> order by <colname>;

The order by clause doesn't show data contents in proper order for char/varchar columns only when the collation used is utf8mb3_danish_ci.

create database x character set utf8mb3 collate utf8mb3_danish_ci;
set names utf8mb3;
use x;
create table t1 (name char(20))character set utf8mb3 collate utf8mb3_danish_ci;
insert into t1 values('aa'),('bb'),('cc'),('dd'),('ee');
select * from t1;
select * from t1 order by name;

How to repeat:
mysql> create database x character set utf8mb3 collate utf8mb3_danish_ci;
Query OK, 1 row affected (0.01 sec)

mysql> set names utf8mb3;
Query OK, 0 rows affected (0.00 sec)

mysql> use x;
Database changed
mysql> create table t1 (name char(20))character set utf8mb3 collate utf8mb3_danish_ci;
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values('aa'),('bb'),('cc'),('dd'),('ee');
Query OK, 5 rows affected (0.01 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from t1;
+------+
| name |
+------+
| aa   |
| bb   |
| cc   |
| dd   |
| ee   |
+------+

mysql> select * from t1 order by name;
+------+
| name |
+------+
| bb   |
| cc   |
| dd   |
| ee   |
| aa   |
+------+
5 rows in set (0.00 sec)
[20 Jun 2008 21:27] Peter Laursen
@hema

.. this is *not a bug*.

In Danish and Norwegian collations the double character sequence 'aa' is identical to the special Nordic character 'å'. The 3 special Danish characters æ,ø and å are alphabetized after z like "a,b.c ... x,y,x,æ,ø,å".

That said, MySQL should support both 'traditional' and 'modern' Danish collations.  The meaning of 'aa' = 'å' was officially abandoned in Danish language with the language reform of 1953 (I do not know about Norwegian), but you find the old form frequently in both geographical names (example: 'Aalborg') and person surnames (example: 'Østergaard') still.  And it is *very rare* in Danish to have a 'aa' sequence NOT meaning 'å' (actually I can only think about words of Frisian/Dutch/Flemish origin)

Peter
not a MySQL person - and Danish!)
[20 Jun 2008 21:28] Peter Laursen
sorry .. typo!

The 3 special Danish characters æ,ø and å are
alphabetized after z like "a,b.c ... x,y,z,æ,ø,å".
[20 Jun 2008 21:33] Peter Laursen
also compare with languages/collation where

ä = ae
ö = oe
ß = ss

etc ...
[20 Jun 2008 21:54] Peter Laursen
one last comment: this is not specific for utf8mb3 character set.  You get the same with every character set supporting Danish: latin1 and any unicode character set (also utf8, utf16, utf32 and ucs2).

Your result is identical to:

bb
cc
dd
ee
å

.. and I think that explains!
[22 Jun 2008 15:46] Sveta Smirnova
Thank you for the report.

Closed as "Not a Bug" for reasons explained by Peter. See also http://www.collation-charts.org/mysql60/mysql604.utf8_danish_ci.html

But there is exception in Peter's explanation: latin1_danish_ci doesn't consider 'aa' > 'bb'
[22 Jun 2008 17:06] Peter Laursen
ok .. Sveta is correct in her last comment

set names utf8;
select 'aa' > 'bb' collate utf8_general_ci; -- returns 0
select 'aa' > 'bb' collate utf8_danish_ci; -- returns 1

set names latin1;
select 'aa' > 'bb' collate latin1_swedish_ci; -- returns 0
select 'aa' > 'bb' collate latin1_danish_ci; -- returns 0

.. but that is *really silly* - Danish is Danish no matter the charset!

I file another bug report about this!
[22 Jun 2008 17:27] Peter Laursen
we continue here: http://bugs.mysql.com/bug.php?id=37571 :-)