Bug #34371 Czech collation
Submitted: 7 Feb 2008 5:19 Modified: 7 Feb 2008 9:28
Reporter: Michal Čihař Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.22 OS:Linux
Assigned to: CPU Architecture:Any

[7 Feb 2008 5:19] Michal Čihař
Description:
Collations utf8_general_ci and ucs2_czech_ci make difference between some accenteted letters and their unaccented counterparts while I think they should be treated like same. This affects at least following characters: š, č, ř, ž, while others I tried does not seem to be affected: ť, ň, ĺ, ď. Same applies to their upper case variants.

How to repeat:
Example MySQL session:

mysql> SET NAMES "utf8";
Query OK, 0 rows affected (0.00 sec)

mysql> set collation_connection="utf8_general_ci";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'č' LIKE 'c'  ;
+---------------+
| 'č' LIKE 'c' |
+---------------+
|             1 | 
+---------------+
1 row in set (0.00 sec)

mysql> set collation_connection="utf8_czech_ci";
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT 'č' LIKE 'c'  ;
+---------------+
| 'č' LIKE 'c' |
+---------------+
|             0 | 
+---------------+
1 row in set (0.00 sec)

mysql> SELECT 'ň' LIKE 'n'  ;
+---------------+
| 'ň' LIKE 'n' |
+---------------+
|             1 | 
+---------------+
1 row in set (0.00 sec)
[7 Feb 2008 5:25] Michal Čihař
I just verified that exactly same thing happens with 5.1.22.
[7 Feb 2008 9:10] Sveta Smirnova
Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://dev.mysql.com/doc/ and the instructions on
how to report a bug at http://bugs.mysql.com/how-to-report.php

Please read about unicode character sets and collations at http://dev.mysql.com/doc/refman/5.1/en/charset-unicode-sets.html
[7 Feb 2008 9:28] Michal Čihař
What I did not find why 'ň' and 'n' ARE treated as same while 'č' and 'c' ARE NOT. I think either both of them should be or neither of them.
[7 Feb 2008 11:00] Vlasta Neubauer
Although i initiated this bug report, i found, that this problem origins directly from czech norm for colation, which is too strict (and slightly out of date, i think). so i must agree - this is neither a bug of MySQL nor a bug of Unicode.