Bug #12075 FULLTEXT non-functional for big5 strings
Submitted: 21 Jul 2005 0:12 Modified: 7 Aug 2005 1:08
Reporter: Kolbe Kegel Email Updates:
Status: Closed Impact on me:
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1 5.0 OS:Linux (Linux)
Assigned to: Sergey Vojtovich CPU Architecture:Any

[21 Jul 2005 0:12] Kolbe Kegel
Fulltext searching is not functional on columns stored using the big5 character set.

The fulltext index remains empty even when rows have been inserted into the table.

No errors or warnings are issued.

How to repeat:
create table u (c char(50) character set big5 not null, fulltext

insert into u (c) values (0xA741ADCCA66EB6DC20A7DAADCCABDCA66E);

select * from u where match(c) against (0xA741ADCCA66EB6DC in
boolean mode);

select * from u where match(c) against (0xA7DAADCCABDCA66E in
boolean mode);

(Note that a space (0x20) appears in the hex string inserted into the table.)

kolbe@lith:/var/mysql/data/test$ ../../bin/myisam_ftdump -d -v u 0
[this command outputs nothing]

insert into u values ('paragraphs and sentences written in latin or roman');

kolbe@lith:/var/mysql/data/test$ ../../bin/myisam_ftdump -d -v u 0
       65            0.9456265 latin
       65            0.9456265 paragraphs
       65            0.9456265 roman
       65            0.9456265 sentences
       65            0.9456265 written

truncate table u;

insert into u values ('paragraphs and sentences' || 0xA741ADCCA66EB6DC);

kolbe@lith:/var/mysql/data/test$ ../../bin/myisam_ftdump -d -v u 0
      194            0.9775171 paragraphs
      194            0.9775171 sentences

This seems to indicate that the correct data is not being stored in the fulltext index for some reason when the data is encoded using big5. Note that a string consisting of latin characters (which are legal in big5) and Chinese characters results in the omission of the Chinese words from the index, while the latin words are included as usual.

Suggested fix:
[2 Aug 2005 7:38] Alexander Barkov
bk commit - 4.1 tree (svoj:1.2364) BUG#12075
[2 Aug 2005 7:39] Alexander Barkov
Sergey, your patch looks fine.
Please move test from "fulltext" to "ctype_big5",
to skip this test block when no big5 is incompiled.
[2 Aug 2005 9:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

[3 Aug 2005 6:23] Sergey Vojtovich
Fixed in 4.1.14, 5.0.11.
[7 Aug 2005 1:08] Mike Hillyer
Documented in 5.0.11 and 4.1.14 changelogs:

 <literal>big5</literal> strings were not being stored in <literal>FULLTEXT</literal> index. (Bug #12075)