Bug #17527 searching latin2 coded strings with "like"
Submitted: 17 Feb 2006 14:11 Modified: 23 Apr 2006 12:58
Reporter: János Szűcs Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.18 OS:Windows (win32)
Assigned to: CPU Architecture:Any

[17 Feb 2006 14:11] János Szűcs
Description:
I use latin2 character set and Hungarian collation in my tables. When I search in a table with "select .... like 'os%'", the result contains not only the fields beginning with "os", but also contains fields beginning with "ös", "ős". That is, it treats the accented Hungarian characters identical to their unaccented pairs. Fortunately, the operator "rlike" and "regexp" works well, so I can use them, but I think "like" should behave exactly for accented characters as "rlike" or "regexp" does.

How to repeat:
1. create a table with latin2 character set, with one string type column
2. add three records with values: "os", "ös","ős"
3. select * from <table> where <field> like 'os%'
4. it reports 3 records
5. select * from <table> where <field> regexp '^os'
6. it reports the first record, as it should be

Suggested fix:
Maybe the case insensitive conversion table for "like" is wrong, and puts accented Hungarian characters to the wrong place.

(but it is only a guess, I wrote it only because I did not want to leave this place empty :-))
[19 Feb 2006 12:38] Valeriy Kravchuk
Thank you for a problem report. Please, send the SHOW CREATE TABLE results for the problematic table and the results of SHOW VARIABLES LIKE 'char%' and SHOW VARIABLES LIKE 'collat%' from the mysql client where you get the problem you described.
[20 Feb 2006 9:14] János Szűcs
The output of 'show variables like 'char%':

Variable_name	Value
character_set_client	latin2
character_set_connection	latin2
character_set_database	latin2
character_set_results	latin2
character_set_server	latin2
character_set_system	utf8
character_sets_dir	C:\\Program Files\\MySQL\\MySQL Server 5.0\\share\\charsets\\

The output of 'show variables like 'collat%':

Variable_name	Value
collation_connection	latin2_general_ci
collation_database	latin2_general_ci
collation_server	latin2_general_ci

The output of 'show create table':

Table	Create Table
iktatokonyv2006	CREATE TABLE `iktatokonyv2006` (\n  `info` char(1) character set latin2 NOT NULL default '',\n  `iktatoszam` varchar(20) character set latin2 NOT NULL,\n  `irany` varchar(60) character set latin2 NOT NULL,\n  `irattipus` varchar(60) character set latin2 NOT NULL,\n  `iktatva` date NOT NULL default '0000-00-00',\n  `hatarido` date default NULL,\n  `lezarva` date default NULL,\n  `partner` varchar(60) character set latin2 NOT NULL,\n  `pcim` varchar(60) character set latin2 default NULL,\n  `phivszam` varchar(60) character set latin2 default NULL,\n  `pugyintezo` varchar(60) character set latin2 default NULL,\n  `ugyintezo` varchar(60) character set latin2 default NULL,\n  `targy` text character set latin2 NOT NULL,\n  `megjegyzes` text character set latin2,\n  `helyszin` varchar(60) character set latin2 NOT NULL,\n  `dosszie` varchar(80) character set latin2 NOT NULL,\n  `aldosszie` varchar(80) character set latin2 NOT NULL,\n  `alternativdosszie` varchar(80) character set latin2 default NULL,\n  `filenev` varchar(60) character set latin2 default NULL,\n  `tetel` varchar(9) character set latin2 NOT NULL,\n  `id` int(10) unsigned NOT NULL auto_increment,\n  `alszam` char(3) collate latin2_hungarian_ci default '001',\n  `kepekszama` int(11) default NULL,\n  `iktatta` varchar(60) collate latin2_hungarian_ci NOT NULL default '',\n  PRIMARY KEY  (`id`),\n  UNIQUE KEY `datum` (`iktatva`,`iktatoszam`,`alszam`),\n  KEY `iktatoszam` (`iktatoszam`,`alszam`)\n) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci CHECKSUM=1 COMMENT='Iktatókönyv adatbázis'
[6 Mar 2006 14:03] Valeriy Kravchuk
Sorry, but I was not able to repeat the behaviour you described with 5.0.20-BK on Linux:

mysql> show create table tl2\G
*************************** 1. row ***************************
       Table: tl2
Create Table: CREATE TABLE `tl2` (
  `c1` varchar(80) character set latin2 default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin2 COLLATE=latin2_hungarian_ci
1 row in set (0.00 sec)

mysql> insert into tl2 values ('os'), ('ös'),('ős');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from tl2 where c1 like 'os%';
+------+
| c1   |
+------+
| os   |
+------+
1 row in set (0.00 sec)

mysql> select * from tl2 where c1 regexp '^os';
+------+
| c1   |
+------+
| os   |
+------+
1 row in set (0.00 sec)

mysql> show variables like 'collat%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin2_general_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.01 sec)

mysql> show variables like 'char%';
+--------------------------+--------------------------------------------+
| Variable_name            | Value                                      |
+--------------------------+--------------------------------------------+
| character_set_client     | latin2                                     |
| character_set_connection | latin2                                     |
| character_set_database   | latin1                                     |
| character_set_filesystem | binary                                     |
| character_set_results    | latin2                                     |
| character_set_server     | latin1                                     |
| character_set_system     | utf8                                       |
| character_sets_dir       | /home/openxs/dbs/5.0/share/mysql/charsets/ |
+--------------------------+--------------------------------------------+
8 rows in set (0.00 sec)

Please, check again with this simplified test case and inform about the results.
[20 Mar 2006 10:57] János Szűcs
I tried the simplified test case You suggested, but I got the same results with 'like': I tried with "like 'os%'", "like 'ös%'" and "like 'ős%'", and in all of the three cases I got three records instead of one. Only 'regexp' worked properly and retrieved only one record. I worked with 5.0.18-nt on a WindowsXP, and used EMS MySQL manager to enter the SQL statements.
[21 Mar 2006 13:15] Valeriy Kravchuk
Please, try to repeat with 5.0.19 now available and, most important, mysql command line client! Inform about the results.
[23 Mar 2006 12:45] János Szűcs
I tried it with 5.0.19 and mysql command line utility, and got the same results as before, here is the output:

Your MySQL connection id is 18 to server version: 5.0.19-nt

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> use test;
Database changed
mysql> select * from t12 where c1 like 'os%';
+------+
| c1   |
+------+
| os   |
| ös   |
| ős   |
+------+
3 rows in set (0.00 sec)

mysql> select * from t12 where c1 rlike 'os';
+------+
| c1   |
+------+
| os   |
+------+
1 row in set (0.00 sec)

mysql>
[23 Mar 2006 12:58] Valeriy Kravchuk
If you get that rtesults with exactly the same settings and steps as in my test case, then you should simply wait for 5.0.20 to be released officially.
[23 Apr 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".