Bug #34130 incorrect french order in utf8_unicode_ci
Submitted: 29 Jan 2008 12:54 Modified: 6 Oct 2016 8:40
Reporter: Denis Lienhardt Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:5.0, 5.1, 6.0 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: character sets, french, utf8

[29 Jan 2008 12:54] Denis Lienhardt
Description:
Description:
The manual (http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html) tells:
"MySQL implements language-specific collations for the utf8 character set only if the ordering with utf8_unicode_ci does not work well for a language. For example, utf8_unicode_ci works fine for German and French, so there is no need to create special utf8 collations for these two languages."

But
utf8_unicode_ci doesn't totally respect:

a < à < ... < æ < b ... < e < é < è < ê ... < o < œ < p

How to repeat:
mysql> CREATE DATABASE essai CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Query OK, 1 row affected (0.00 sec)

mysql> USE essai
Database changed

mysql> CREATE TABLE mots (Mot varchar(20));
Query OK, 0 rows affected (0.00 sec)

mysql> show create TABLE mots \G
*************************** 1. row ***************************
       Table: mots
Create Table: CREATE TABLE `mots` (
  `Mot` varchar(20) collate utf8_unicode_ci default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

mysql>  INSERT INTO mots VALUES ('A'),('a'),('b'),('o'),('p'),('P'),('œ'),('æ'),('é'),('è'),('ê'),('e'),('à'),('E');
Query OK, 14 rows affected (0.00 sec)
Records: 14  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM mots ORDER BY Mot;
+------+
| Mot  |
+------+
| A    |
| a    |
| à   |
| è   |
| œ   |
| é   |
| æ   |
| ê   |
| b    |
| e    |
| E    |
| o    |
| P    |
| p    |
+------+
14 rows in set (0.00 sec)

Suggested fix:
For me utf8_unicode_ci doesn't works fine for French. Same thing with utf8_general_ci.

A french specific collation or fix the utf8_general_ci.
[30 Jan 2008 16:00] Susanne Ebrecht
Many thanks for writing a bug report.

utf8_general_ci is very old and not good for French or German at all.

When you are using UTF8 for French or German, utf8_unicode_ci is the only collation, we provide at the moment.

Because this collation is for lots of languages, it is not perfect for French and German.

Making own UTF8 collations for some of the Western European languages is already a point on our "todo" list.

Also of course, we would be happy, if the community could make this.

If you want to create your own collation, you can look here:
http://forge.mysql.com/wiki/How_to_Add_a_Collation
[7 Feb 2008 13:52] Alexander Barkov
utf8_unicode_ci is accent insensitive by design.

Adding support of collations with strict accent order is on TODO:
 http://forge.mysql.com/worklog/task.php?id=896

Changing severity to "Feature request"
[6 Oct 2016 8:40] Bernt Marius Johnsen
Fixed from MySQL 5.5
[6 Oct 2016 8:41] Bernt Marius Johnsen
Posted by developer:
 
Fixed from MySQL 5.5