Bug #34371 Czech collation
Submitted: 7 Feb 2008 6:19 Modified: 7 Feb 2008 10:28
Reporter: Michal ÄŒihaÅ™
Status: Not a Bug
Category:Server: Charsets Severity:S3 (Non-critical)
Version:5.1.22 OS:Linux
Assigned to: Target Version:

[7 Feb 2008 6:19] Michal ÄŒihaÅ™
Description:
Collations utf8_general_ci and ucs2_czech_ci make difference between some accenteted
letters and their unaccented counterparts while I think they should be treated like same.
This affects at least following characters: š, č, ř, ž, while others I tried does not
seem to be affected: ť, ň, ĺ, ď. Same applies to their upper case variants.

How to repeat:
Example MySQL session:

mysql> SET NAMES "utf8";
Query OK, 0 rows affected (0.00 sec)

mysql> set collation_connection="utf8_general_ci";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'č' LIKE 'c'  ;
+---------------+
| 'č' LIKE 'c' |
+---------------+
|             1 | 
+---------------+
1 row in set (0.00 sec)

mysql> set collation_connection="utf8_czech_ci";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'č' LIKE 'c'  ;
+---------------+
| 'č' LIKE 'c' |
+---------------+
|             0 | 
+---------------+
1 row in set (0.00 sec)

mysql> SELECT 'ň' LIKE 'n'  ;
+---------------+
| 'ň' LIKE 'n' |
+---------------+
|             1 | 
+---------------+
1 row in set (0.00 sec)
[7 Feb 2008 6:25] Michal ÄŒihaÅ™
I just verified that exactly same thing happens with 5.1.22.
[7 Feb 2008 10:10] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check
the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Please read about unicode character sets and collations at
http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html
[7 Feb 2008 10:28] Michal ÄŒihaÅ™
What I did not find why 'ň' and 'n' ARE treated as same while 'č' and 'c' ARE NOT. I
think either both of them should be or neither of them.
[7 Feb 2008 12:00] Vlasta Neubauer
Although i initiated this bug report, i found, that this problem origins directly from
czech norm for colation, which is too strict (and slightly out of date, i think). so i
must agree - this is neither a bug of MySQL nor a bug of Unicode.