Bug #21945 cannot search thai words with more than 26 character long
Submitted: 31 Aug 2006 11:02 Modified: 1 Oct 2006 12:16
Reporter: Anan T Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0.24, 5.0.18 OS:Windows (WindowsXP)
Assigned to: CPU Architecture:Any

[31 Aug 2006 11:02] Anan T
Description:
Any thai text that is longer than 26 character cannot be searched correctly with certain table definition.
The condition that create the bug is to insert an empty string as an entry. The table must have certain column definition.

How to repeat:
1.set up server and client to use tis-620 char-set.

2. create a table:
create table test (NAME_ID MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,t_name varchar(80) not null,alias int default 0);
alter table test add UNIQUE THAI (T_NAME,ALIAS);

3. insert entry:
insert into test (t_name) values ('วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม');

4. do query (and get a correct result):
select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม';

5. insert an empty string entry:
insert into test (T_NAME) values ('');

5. do the same query again and get nothing:
select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม';

I found the name_id column is the condition that causes this error.
[31 Aug 2006 11:16] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.24, and inform about the results. Send the results of

show variables like 'char%';

also.
[1 Sep 2006 5:36] Anan T
Repeat the test to MySQL5.0.24 and get the same result.

Here is what I get from "show variables like 'char%';":
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | tis620                         |
| character_set_connection | tis620                         |
| character_set_database   | utf8                           |
| character_set_filesystem | binary                         |
| character_set_results    | tis620                         |
| character_set_server     | tis620                         |
| character_set_system     | utf8                           |
| character_sets_dir       | C:\mysql5.0.24\share\charsets\ |
+--------------------------+--------------------------------+

I would like to add one more thing. I can put at most 26 characters in a query with '%' suffix to get a correct answer. If I put more than 26 characters with '%' suffix, there will be no result. For example,

mysql> insert into test (t_name) values ('วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม';
+---------+-------------------------------+-------+
| NAME_ID | t_name                        | alias |
+---------+-------------------------------+-------+
|       5 | วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม |     0 |
+---------+-------------------------------+-------+
1 row in set (0.00 sec)

mysql> insert into test (T_NAME) values ('');
Query OK, 1 row affected (0.00 sec)

mysql> select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม';
Empty set (0.00 sec)

mysql> select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธ%';
+---------+-------------------------------+-------+
| NAME_ID | t_name                        | alias |
+---------+-------------------------------+-------+
|       5 | วัดเขาแดงอรุณราษฎร์ศรัทธาธรรม |     0 |
+---------+-------------------------------+-------+
1 row in set (0.00 sec)

mysql> select * from test where t_name like 'วัดเขาแดงอรุณราษฎร์ศรัทธาธร%';
Empty set (0.00 sec)
[1 Sep 2006 5:49] Anan T
I find it is strange that what I post in the web form shows incorrect Thai characters between different queries even though they appear all the same over here. I would like confirm that this is not a typo issue. I even reuse the queries by using up and down arrows. So there is no chance of typo mistake.
[2 Sep 2006 7:45] Valeriy Kravchuk
Please, send exact SHOW CREATE TABLE results for yout test table. As your database  has utf8 as character set, and 26*3 = 78 (and 27*3 = 81 > 80), I have some idea to check...
[4 Sep 2006 7:04] Anan T
Result of "show create table test":

------------------------------------+
| test  | CREATE TABLE `test` (
  `NAME_ID` mediumint(8) unsigned NOT NULL auto_increment,
  `t_name` varchar(80) NOT NULL,
  `alias` int(11) default '0',
  PRIMARY KEY  (`NAME_ID`),
  UNIQUE KEY `THAI` (`t_name`,`alias`)
) ENGINE=MyISAM AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------
[4 Sep 2006 7:39] Anan T
I think you are right. I tried modify the column to have longer VARCHAR and the error was gone. So my query was cut short somewhere to match the length in column definition. Thanks so much for the answer.
I still think though that the differences in "character footprint" between UTF-8 and other character sets shouldn't be handled by users:>
Have a great day!
[1 Oct 2006 12:16] Valeriy Kravchuk
The idea was that on Windows "new-line" you used in your LIKE clause is presented with 2 characters, and, thus, truncated when it is 26+ characters long. According to your last message, size of the column really matters. So, I think, it is not a bug.