Bug #3444 Case sensitivity in czech comparisons
Submitted: 12 Apr 2004 6:13 Modified: 16 Jun 2004 10:30
Reporter: Tomas Tikovsky
Status: Verified
Category:Server: Charsets Severity:S4 (Feature request)
Version:4.1.1 OS:Microsoft Windows (winxp)
Assigned to: Alexander Barkov Target Version:
Triage: Triaged: D5 (Feature request)

[12 Apr 2004 6:13] Tomas Tikovsky
Description:
Hi,

im quite confused how to deal with this issue. Maybe its my fault but i dont understand
it.

Manual says that czech is always case sensitive but mr. Golubov said it should work in
this version of mysql. I wanted to use fulltext indeces but this makes them useless. If
searching for česky i want also results for Česky. And thats not working. Its
same in "where like" clauses. If there is any way how to help mysql understand CASE
INSENSITIVE comparisons of accented characters i would be very pleased if u could advice
me. If its not mysql fault please accept my appologies, i just didnt got it from manual.
Thanx for any help.
Regards
Tomas Tikovsky

How to repeat:
Default server setup:
mysql> show variables like "c%";
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | latin1                   |
| character_set_system     | utf8                     |
| character_set_database   | latin1                   |
| character_set_client     | latin1                   |
| character_set_connection | latin1                   |
| character-sets-dir       | D:\mysql\share\charsets/ |
| character_set_results    | latin1                   |
| collation_connection     | latin1_swedish_ci        |
| collation_database       | latin1_swedish_ci        |
| collation_server         | latin1_swedish_ci        |
| concurrent_insert        | ON                       |
| connect_timeout          | 5                        |
+--------------------------+--------------------------+

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)

Thats expected behaviour, but i need this to happen also using accented characters in
czech language. As in theese š=Š. That was small and big "s" letter with inverted
circumflex (wedge).
So i've setup server as following. Im on windows with cp1250 charset so i used this. 

mysql> show variables like "c%";
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | cp1250                   |
| character_set_system     | utf8                     |
| character_set_database   | cp1250                   |
| character_set_client     | cp1250                   |
| character_set_connection | cp1250                   |
| character-sets-dir       | D:\mysql\share\charsets/ |
| character_set_results    | cp1250                   |
| collation_connection     | cp1250_czech_ci          |
| collation_database       | cp1250_czech_ci          |
| collation_server         | cp1250_czech_ci          |
| concurrent_insert        | ON                       |
| connect_timeout          | 5                        |
+--------------------------+--------------------------+

--------------------------------------------------------------------------

mysql> select "š"="Š";
+---------+
| "š"="Š" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

This could indicate that comparison is case sensitive.
[12 Apr 2004 6:18] Tomas Tikovsky
Trying to change category to server. I mistyped it.
[14 Apr 2004 21:43] Sergei Golubchik
ok, you are right. confirmed.
According to the comments in the ctype-win1250ch.c (you can find the file with
"grep cp1250_czech_ci") the comparison is indeed case-sensitive - so it's how the
original contributor implemented it.

The very least we have to do is to rename the collation to cp1250_czech_cs ("cs" means
for "case sensitive"). It's obviously will not help you, though :)
Another solution would be of course, to make ctype-win1250ch.c do case-insensitive
comparison.
Whether we can do it (and when we can do it), you'll hear from Alexander Barkov - who is
the developer behind our character set code. He will also reply to your email to
internals@.

Also, you may try a workaround - use latin2_czech_ci as database/server charset and
cp1258 as client charset only. Then all data will be stored/compared in latin2_czech_ci -
that should work case-insentive, and will be converted to cp1258 before sending to the
client.
[15 Apr 2004 1:17] Tomas Tikovsky
Thanks for reply, but i think that latin2_czech_ci charset is unfortunately case-sensitive
as well.

show variables like "c%"
+--------------------------+--------------------------+
| Variable_name            | Value                    |
+--------------------------+--------------------------+
| character_set_server     | latin2                   |
| character_set_system     | utf8                     |
| character_set_database   | latin2                   |
| character_set_client     | latin2                   |
| character_set_connection | latin2                   |
| character-sets-dir       | C:\Mysql\share\charsets/ |
| character_set_results    | latin2                   |
| collation_connection     | latin2_czech_ci          |
| collation_database       | latin2_czech_ci          |
| collation_server         | latin2_czech_ci          |
| concurrent_insert        | ON                       |
+--------------------------+--------------------------+

mysql> select "s"="S";
+---------+
| "s"="S" |
+---------+
|       0 |
+---------+
1 row in set (0.00 sec)

Well, as i know czech must be case sensitive when sorting results. But this behaviour in
fulltext index is a bit unpleasant. Im free for any help u would need, 'couse it would
help a lot of people that despairs of it. Thanks in advance.

Regards
Tomas Tikovsky
[5 May 2004 14:25] Alexander Barkov
We should definitely add case and accent insensitive counterparts
to both latin2 and cp1250 Czech collations. I will learn if
it is possible to reuse the case sensitive code asap.
[16 Jun 2004 10:30] Alexander Barkov
See also worklog item WL#1875
[4 Jan 2007 16:05] Domas Mituzas
This has been idle for two years. Needs revisiting. :)