Bug #5179 Chinese Word Searching Problem
Submitted: 24 Aug 2004 11:58 Modified: 24 Aug 2004 13:39
Reporter: Thomas Chan Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.017 & 4.020 OS:Linux (Linux & Window)
Assigned to: CPU Architecture:Any

[24 Aug 2004 11:58] Thomas Chan
Description:
We tried to search one Chinese Word with the following Select Statement :-

-   "select * from library where chinesename like '%".addcslashes(trim($bookname),"\\_%")."%' ";

;where bookname entered as '偉'  - hex code value stored as 'b0b6'

When we tried to perform the search function by using PHP program will have the  unexpected result come out.

After our findings, we find out that the searching pattern cannot match for each '4' hex code as one unit character and it will search by wildcard pattern. As the Chinese Character will store as two bytes characters to represent one word so that this wildcard pattern searching will come out the wrong result. 

For example :-  1) 0xabb0b654a6b3
                          - this is wrong, b0b6 will represent as two Chinese word
                             (b0 for one word and b6 for another word)
                      2) 0xb0b6b56f7ebb  
                          - this is correct, 4 continoues bytes together as 'b066' and 
                             form as one chinese word. 
                      3) 0xbbc8abb0b654
                          - this is wrong, b0b6 will represent as two Chinese word
                             (b0 for one word and b6 for another word)

Is this a limitation of MySql to search the proper Chinese Word ?  

How to repeat:
1) To create the table with BLOB type to store the above chinese character.
2) To perform the search by typing : '偉'  
3) Wrong Result will come out as metioned on above
[24 Aug 2004 13:39] MySQL Verification Team
Thank you for writting to us. A problem that you describe truly exists in 4.0.

But, this is fixed in 4.1 and will not be fixed in 4.0.

Please try latest 4.1.4 and tell us if a search work.

But first you will have to define charsets very carefully.