Description:
MySQL does not correctly handle wildcard characters in tables with ucs2 data, when SELECT-ing data using a LIKE query. It appears, that the "%" and "_" characters are treated literally instead of working as wildcards.
For example: When I have a ucs2 table (say it's called "foo") that contains the word "cat" in a field called "word", I would expect the statement:
SELECT * FROM foo WHERE word LIKE "c%";
to return a result. It does not.
The software is compiled with latin1 as the default character set and this is not being changed or overridden at any time.
How to repeat:
Using the mysql cli client, create a table using the following command:
mysql> CREATE TABLE foo ( word varchar(64) NOT NULL, PRIMARY KEY (word))
-> TYPE=MyISAM CHARACTER SET ucs2 COLLATE ucs2_general_ci;
Insert some data:
mysql> INSERT INTO foo (word) VALUES ("cat");
Query OK, 1 row affected (0.00 sec)
Execute the following:
mysql> SELECT * FROM foo WHERE word LIKE "c%" ;
Empty set (0.00 sec)
mysql> SELECT * FROM foo WHERE word LIKE "ca_" ;
Empty set (0.00 sec)
This behaviour appears to me to be incorrect. For reference, the following works:
mysql> SELECT * FROM foo WHERE word LIKE "cat" ;
+------+
| word |
+------+
| cat |
+------+
1 row in set (0.00 sec)
This last statement shows the data is there, and can be retrieved.
Suggested fix:
The suggested fix is to correctly implement handling of wildcard characters in ucs2. (Make character 0x0025 work the same as character 0x25)
A workaround is to specify the search string in hex, and append a single-byte wildcard character to the end. (AFAIK this shouldn't be allowed, as you're not supposed to be able to have single-byte ucs2 characters)
For example:
the letter "c" in ucs2 is 0x0063
the letter "a" in ucs2 is 0x0061
the "%" character in latin1 is 0x25
the "_" character in latin1 is 0x5F
mysql> SELECT * FROM foo WHERE word LIKE _ucs2 x'006325';
+------+
| word |
+------+
| cat |
+------+
1 row in set (0.00 sec)
or:
mysql> SELECT * FROM foo WHERE word LIKE _ucs2 x'006300615F';
+------+
| word |
+------+
| cat |
+------+
1 row in set (0.00 sec)