Bug #3444 Case sensitivity in czech comparisons
Submitted: 12 Apr 2004 4:13 Modified: 18 Jan 2018 13:01
Reporter: Tomas Tikovsky Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S4 (Feature request)
Version:4.1.1 OS:Windows (winxp)
Assigned to: Assigned Account CPU Architecture:Any

[12 Apr 2004 4:13] Tomas Tikovsky
Description:
Hi,

im quite confused how to deal with this issue. Maybe its my fault but i dont understand it.

Manual says that czech is always case sensitive but mr. Golubov said it should work in this version of mysql. I wanted to use fulltext indeces but this makes them useless. If searching for česky i want also results for Česky. And thats not working. Its same in "where like" clauses. If there is any way how to help mysql understand CASE INSENSITIVE comparisons of accented characters i would be very pleased if u could advice me. If its not mysql fault please accept my appologies, i just didnt got it from manual. Thanx for any help.
Regards
Tomas Tikovsky

How to repeat:
Default server setup:
mysql> show variables like "c%";
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | latin1                   |
| character_set_system     | utf8                     |
| character_set_database   | latin1                   |
| character_set_client     | latin1                   |
| character_set_connection | latin1                   |
| character-sets-dir       | D:\mysql\share\charsets/ |
| character_set_results    | latin1                   |
| collation_connection     | latin1_swedish_ci        |
| collation_database       | latin1_swedish_ci        |
| collation_server         | latin1_swedish_ci        |
| concurrent_insert        | ON                       |
| connect_timeout          | 5                        |
+--------------------------+--------------------------+

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

Thats expected behaviour, but i need this to happen also using accented characters in czech language. As in theese š=Š. That was small and big "s" letter with inverted circumflex (wedge).
So i've setup server as following. Im on windows with cp1250 charset so i used this. 

mysql> show variables like "c%";
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | cp1250                   |
| character_set_system     | utf8                     |
| character_set_database   | cp1250                   |
| character_set_client     | cp1250                   |
| character_set_connection | cp1250                   |
| character-sets-dir       | D:\mysql\share\charsets/ |
| character_set_results    | cp1250                   |
| collation_connection     | cp1250_czech_ci          |
| collation_database       | cp1250_czech_ci          |
| collation_server         | cp1250_czech_ci          |
| concurrent_insert        | ON                       |
| connect_timeout          | 5                        |
+--------------------------+--------------------------+

--------------------------------------------------------------------------

mysql> select "š"="Š";
+---------+
| "š"="Š" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

This could indicate that comparison is case sensitive.
[12 Apr 2004 4:18] Tomas Tikovsky
Trying to change category to server. I mistyped it.
[14 Apr 2004 19:43] Sergei Golubchik
ok, you are right. confirmed.
According to the comments in the ctype-win1250ch.c (you can find the file with
"grep cp1250_czech_ci") the comparison is indeed case-sensitive - so it's how the original contributor implemented it.

The very least we have to do is to rename the collation to cp1250_czech_cs ("cs" means for "case sensitive"). It's obviously will not help you, though :)
Another solution would be of course, to make ctype-win1250ch.c do case-insensitive comparison.
Whether we can do it (and when we can do it), you'll hear from Alexander Barkov - who is the developer behind our character set code. He will also reply to your email to internals@.

Also, you may try a workaround - use latin2_czech_ci as database/server charset and cp1258 as client charset only. Then all data will be stored/compared in latin2_czech_ci - that should work case-insentive, and will be converted to cp1258 before sending to the client.
[14 Apr 2004 23:17] Tomas Tikovsky
Thanks for reply, but i think that latin2_czech_ci charset is unfortunately case-sensitive as well.

show variables like "c%"
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | latin2                   |
| character_set_system     | utf8                     |
| character_set_database   | latin2                   |
| character_set_client     | latin2                   |
| character_set_connection | latin2                   |
| character-sets-dir       | C:\Mysql\share\charsets/ |
| character_set_results    | latin2                   |
| collation_connection     | latin2_czech_ci          |
| collation_database       | latin2_czech_ci          |
| collation_server         | latin2_czech_ci          |
| concurrent_insert        | ON                       |
+--------------------------+--------------------------+

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

Well, as i know czech must be case sensitive when sorting results. But this behaviour in fulltext index is a bit unpleasant. Im free for any help u would need, 'couse it would help a lot of people that despairs of it. Thanks in advance.

Regards
Tomas Tikovsky
[5 May 2004 12:25] Alexander Barkov
We should definitely add case and accent insensitive counterparts
to both latin2 and cp1250 Czech collations. I will learn if
it is possible to reuse the case sensitive code asap.
[16 Jun 2004 8:30] Alexander Barkov
See also worklog item WL#1875
[4 Jan 2007 15:05] Domas Mituzas
This has been idle for two years. Needs revisiting. :)
[18 Jan 2018 13:01] Erlend Dahl
[30 Nov 2017 23:53] Xing Z Zhang 

This has been fixed in 4.1.3 by adding new collations: utf8_czech_ci, ucs2_czech_ci, utf16_czech_ci and utf32_czech_ci.