Bug #7072 wrong sort order for russian characters in ut8 charset
Submitted: 7 Dec 2004 12:47 Modified: 7 Dec 2004 13:28
Reporter: [ name withheld ] Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.7 OS:Linux (Linux)
Assigned to: CPU Architecture:Any

[7 Dec 2004 12:47] [ name withheld ]
Description:
Russian letters "YO" and "YE" (ё,е) won't sort correctly when utf8 encoded.
First must be "YE" and then "YO". Now they are sorted in random order.

How to repeat:
create temporary table tmptable(field1 varchar(8) character set utf8 collate utf8_general_ci)default character set utf8;

insert into tmptable values (0xd191/*small yo*/),(0xd081/*capital yo*/),(0xd0b5/*small ye*/),(0xd095/*capital ye*/),/*and once more*/(0xd191),(0xd081),(0xd0b5),(0xd095);

select field1, hex(field1) from tmptable order by field1;

select field1, hex(field1) from tmptable order by field1;

+--------+-------------+
| field1 | hex(field1) |
+--------+-------------+
| ё     | D191        |
| Ё     | D081        |
| е     | D0B5        |
| Е     | D095        |
| ё     | D191        |
| Ё     | D081        |
| е     | D0B5        |
| Е     | D095        |
+--------+-------------+
8 rows in set (0.00 sec)

select field1, hex(field1) from tmptable order by convert(field1 using cp1251);

+--------+-------------+
| field1 | hex(field1) |
+--------+-------------+
| е     | D0B5        |
| Е     | D095        |
| е     | D0B5        |
| Е     | D095        |
| ё     | D191        |
| Ё     | D081        |
| ё     | D191        |
| Ё     | D081        |
+--------+-------------+
8 rows in set (0.00 sec)

Suggested fix:
strings/ctype-utf8.c  ???
[7 Dec 2004 13:28] Alexander Keremidarski
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.mysql.com/documentation/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Additional info:

utf8_general_ci collation is Accent Insensitive and therefore "yo" = "ye" according to it. 
Letters with same sorting weight are sorted in random order .

cp1251 is Accent Sensitive that's why it sorts "yo" after "ye"

mysql> SET NAMES utf8; SELECT 'е' = 'ё';
Query OK, 0 rows affected (0.00 sec)

+-------------+
| 'е' = 'ё' |
+-------------+
|           1 |
+-------------+
1 row in set (0.00 sec)

mysql> SET NAMES cp1251; SELECT 'е' = 'ё';
Query OK, 0 rows affected (0.00 sec)

+-------------+
| 'е' = 'ё' |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)