Bug #11699 Danish collations sort strange with [ and ]
Submitted: 2 Jul 2005 2:34 Modified: 25 Jul 2005 11:12
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.7 OS:Any (*)
Assigned to: Alexander Barkov CPU Architecture:Any

[2 Jul 2005 2:34] Peter Laursen
Description:
the '[' and ']' -character appear among letters when sorted with danish collations (latin as well as utf)

How to repeat:
create database if not exists `test`;

USE `test`;

/*Table structure for table `tablename1` */

drop table if exists `tablename1`;

CREATE TABLE `tablename1` (
  `id` bigint(20) NOT NULL auto_increment,
  `mytext` varchar(20) collate latin1_danish_ci default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_danish_ci;

/*Data for the table `tablename1` */

insert into `tablename1` values (1,'øl');
insert into `tablename1` values (2,'ål');
insert into `tablename1` values (3,'ævbæv');
insert into `tablename1` values (4,'[korrekt]');
insert into `tablename1` values (5,']forkert[');
insert into `tablename1` values (6,'övelse');
insert into `tablename1` values (8,'aalborg');
insert into `tablename1` values (9,'blabla');
insert into `tablename1` values (10,'!!!');
insert into `tablename1` values (11,'ärkesvensk');

select * from tablename1 order by mytext;

result:

    id  mytext    
------  ----------
    10  !!!       
     8  aalborg   
     9  blabla    
     4  [korrekt] 
    11  ärkesvensk
     3  ævbæv     
     1  øl        
     6  övelse        
     5  ]forkert[ 
     2  ål        
     

Right would be I believe

    id  mytext    
------  ----------
    10  !!!       
     4  [korrekt]
     5  ]forkert[ 
     8  aalborg   
     9  blabla    
    11  ärkesvensk
     3  ævbæv     
     1  øl        
     6  övelse        
     2  ål   

Suggested fix:
Yes! :-)
[2 Jul 2005 2:57] Peter Laursen
and please consider the sort order of the '{' and '}' -characters as well
dataset extended with two rows (appear at the bottom):
 
    id  mytext    
------  ----------
    10  !!!       
     8  aalborg   
     9  blabla    
     4  [korrekt] 
    11  ärkesvensk
     3  ævbæv     
     1  øl        
     6  öv        
     5  ]forkert[ 
     2  ål        
    12  {godt nok}
    13  }skidt{
[2 Jul 2005 5:27] Aleksey Kishkin
Tested on linux also. Got:

 select * from tablename1 order by mytext;
+----+-------------+
| id | mytext      |
+----+-------------+
| 10 | !!!         |
|  8 | aalborg     |
|  9 | blabla      |
|  4 | [korrekt]   |
| 11 | ärkesvensk |
|  3 | ævbæv     |
|  1 | øl         |
|  6 | övelse     |
|  5 | ]forkert[   |
|  2 | ål         |
+----+-------------+
[2 Jul 2005 6:17] Peter Laursen
I also want to add the the wellknown Spanish characters "reversed ?" and "reversed !" (don't know how to input them from my keyboard!) appear after the last letter in Danish alphabet (that is 'Å').

But that maybe is OK, since these are not defined with the Danish character set at all.
[2 Jul 2005 6:29] Peter Laursen
and also remember '|' -character  ... It appears beween { and } .

In Good Old DOS-days (mid-80'es) Danish letters ÆØÅæøå were mapped til ASCII [ ] | { } (and one more that I don't remember). It probably is the ghost of it we see here.  

Other language collations might be affected too since mapping of national characters to [ ] | { } was quite common at that time.
[2 Jul 2005 7:15] Peter Laursen
now I remember.  It was the backslash-character.

    id  mytext         
------  ---------------
    10  !!!            
     8  aalborg        
     9  blabla         
     4  [korrekt]      
    11  ärkesvensk     
     3  ævbæv          
    15  \backslash     
     1  øl             
     6  öv             
     5  ]forkert[      
     2  ål             
    12  {godt nok}     
    14  |op på fars hat
    13  }skidt{
[2 Jul 2005 19:42] Peter Laursen
alos surprises me that $ occours before alphanumericals - other currency symbols after.

    id  mytext         
------  ---------------
    19  $$$$           
    23  0 og nix       
     8  aalborg        
    20  €€€€€          
    24  ££££
[11 Jul 2005 8:27] Alexander Barkov
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

This is not a bug, this is expected behaviour.
latin1_danish_ci collation was designed to
sort data this way. We'll not change it.

If you need another sorting order, consider
switching to utf8_danish_ci.

Or, alternatively, you can create another Danish collation for latin1.
[8 Jan 2010 21:11] Peter Gulutzan
WL#5213 Danish collation
http://forge.mysql.com/worklog/task.php?id=5213