Bug #8608 The ucs2_unicode_ci collation fails with several Cyrillic alphabets
Submitted: 18 Feb 2005 20:59 Modified: 26 Feb 2005 18:57
Reporter: Peter Gulutzan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.3-alpha-debug OS:Linux (SUSE 9.2)
Assigned to: Paul DuBois CPU Architecture:Any

[18 Feb 2005 20:59] Peter Gulutzan
Description:
According to the MySQL Reference Manual,
"utf8_general_ci does not support expansions ...  But in other respects, [utf8_general_ci]
tries to reproduce utf8_unicode_ci as much as possible."
But ucs2_unicode_ci (which works the same as utf8_unicode_ci) works with Bulgarian,
Serbian, and (most) Ukrainian characters, while ucs2_general_ci doesn't work.
 

How to repeat:
mysql> create table cy (s1 char character set ucs2);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into cy values (0x0452) /* Serbian small letter tshe */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into cy values (0x0406) /* Ukrainian capital letter I */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into cy values (0x0430) /* Cyrillic capital letter A */;
Query OK, 1 row affected (0.00 sec)

mysql> insert into cy values (0x044f) /* Cyrillic small letter ia */;
Query OK, 1 row affected (0.00 sec)

mysql> select hex(s1) from cy order by s1 collate ucs2_unicode_ci;
+---------+
| hex(s1) |
+---------+
| 0430    |
| 0452    |
| 0406    |
| 044F    |
+---------+
4 rows in set (0.00 sec)

mysql> select hex(s1) from cy order by s1 collate ucs2_general_ci;
+---------+
| hex(s1) |
+---------+
| 0452    |
| 0406    |
| 0430    |
| 044F    |
+---------+
4 rows in set (0.01 sec)
[26 Feb 2005 18:57] Paul DuBois
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).