Bug #27877 incorrect german order in utf8_general_ci
Submitted: 17 Apr 2007 10:43 Modified: 26 Mar 2008 18:58
Reporter: Domas Mituzas
Status: Closed
Category:Server: Charsets Severity:S3 (Non-critical)
Version:5.1-bk, 5.0-bk OS:Any
Assigned to: Alexander Barkov Target Version:5.0+
Triage: D2 (Serious)

[17 Apr 2007 10:43] Domas Mituzas
Description:
The manual (http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html) tells:

"
A difference between the collations is that this is true for utf8_general_ci:

ß = s
Whereas this is true for utf8_unicode_ci:

ß = ss
"

How to repeat:
mysql> set names utf8;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into ge values ('a'),('b'),('s'),('u'),('ß');
Query OK, 5 rows affected (0.05 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> select * from ge order by a collate utf8_general_ci;
+------+
| a    |
+------+
| a    | 
| b    | 
| s    | 
| u    | 
| ß   | 
+------+
5 rows in set (0.01 sec)

mysql> select * from ge order by a collate utf8_unicode_ci;
+------+
| a    |
+------+
| a    | 
| b    | 
| s    | 
| ß   | 
| u    | 
+------+
5 rows in set (0.02 sec)

Suggested fix:
correct the manual (specify, that German language ordering is not done by
utf8_general_ci), or fix the collation.
[4 Sep 2007 18:19] Alexander Barkov
Domas, 
please add "SHOW CREATE TABLE" output,
and also results of these queries:

mysql> select a, hex(a) from ge order by a collate utf8_general_ci;
mysql> select a, hex(a) from ge order by a collate utf8_unicode_ci;
[4 Sep 2007 18:25] Alexander Barkov
Domas,

Sorry, there's no need for additional info.
I manage to repeat this problem. This is really a bug.
[11 Feb 2008 13:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42026

ChangeSet@1.2546, 2008-02-11 16:28:33+04:00, bar@mysql.com +10 -0
  Bug#27877 incorrect german order in utf8_general_ci
  Problem: incorrect sort order for "U+00DF SHARP S".
  Fix: changing sort order for U+00DF to be equal to 's',
  like the manual says.
[15 Feb 2008 12:08] Alexander Barkov
Pushed into 5.1.24-rpl
[19 Mar 2008 21:38] H?kan Askengren
I guess this fix not only apply to sortorder, but also to selection with "=" and "like"?

set names utf8 COLLATE utf8_general_ci;
select 'a' = 'Ä'; # ok
select 'ß' = 's'; # not ok
[25 Mar 2008 12:23] Bugs System
Pushed into 5.1.24-rc
[26 Mar 2008 18:58] Paul DuBois
Noted in 5.1.24 changelog.

The utf8_general_ci collation incorrectly did not sort "U+00DF SHARP S" equal to 's'.
[26 Mar 2008 20:00] Bugs System
Pushed into 6.0.5-alpha
[30 Mar 2008 10:39] Jon Stephens
Fix also noted in the changelogs for 5.1.23-ndb-6.3.11 and 6.0.5.
[17 Jul 2008 2:07] Paul DuBois
Addition to changelog entry:

As a result of this fix, any indexes on columns that use the
utf8_general_ci or ucs2_general_ci collation (especially columns that
use German SHARP S) must be rebuilt when upgrading to 5.1.24/6.0.5 or
higher. To do this, use ALTER TABLE to drop and re-add the indexes,
or mysqldump to dump the affected tables and mysql to reload the dump
file.