Bug #1181 Wildcards ("%" and "_") not handled correctly in UCS2 tables
Submitted: 2 Sep 2003 11:24 Modified: 22 Sep 2003 5:45
Reporter: Simon Detheridge Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.1.1alpha (source distribution) OS:Linux (linux (gentoo))
Assigned to: Alexander Barkov CPU Architecture:Any

[2 Sep 2003 11:24] Simon Detheridge
Description:
MySQL does not correctly handle wildcard characters in tables with ucs2 data, when SELECT-ing data using a LIKE query. It appears, that the "%" and "_" characters are treated literally instead of working as wildcards.

For example: When I have a ucs2 table (say it's called "foo") that contains the word "cat" in a field called "word", I would expect the statement: 
SELECT * FROM foo WHERE word LIKE "c%";
to return a result. It does not.

The software is compiled with latin1 as the default character set and this is not being changed or overridden at any time.

How to repeat:
Using the mysql cli client, create a table using the following command:
mysql> CREATE TABLE foo ( word varchar(64) NOT NULL, PRIMARY KEY (word))
    -> TYPE=MyISAM CHARACTER SET ucs2 COLLATE ucs2_general_ci;

Insert some data:
mysql> INSERT INTO foo (word) VALUES ("cat");
Query OK, 1 row affected (0.00 sec)

Execute the following:
mysql> SELECT * FROM foo WHERE word LIKE "c%" ;
Empty set (0.00 sec)

mysql> SELECT * FROM foo WHERE word LIKE "ca_" ;
Empty set (0.00 sec)

This behaviour appears to me to be incorrect. For reference, the following works:
mysql> SELECT * FROM foo WHERE word LIKE "cat" ;
+------+
| word |
+------+
| cat  |
+------+
1 row in set (0.00 sec)

This last statement shows the data is there, and can be retrieved.

Suggested fix:
The suggested fix is to correctly implement handling of wildcard characters in ucs2. (Make character 0x0025 work the same as character 0x25)

A workaround is to specify the search string in hex, and append a single-byte wildcard character to the end. (AFAIK this shouldn't be allowed, as you're not supposed to be able to have single-byte ucs2 characters)

For example:
the letter "c" in ucs2 is 0x0063
the letter "a" in ucs2 is 0x0061
the "%" character in latin1 is 0x25
the "_" character in latin1 is 0x5F

mysql> SELECT * FROM foo WHERE word LIKE _ucs2 x'006325';
+------+
| word |
+------+
| cat  |
+------+
1 row in set (0.00 sec)

or:

mysql> SELECT * FROM foo WHERE word LIKE _ucs2 x'006300615F';
+------+
| word |
+------+
| cat  |
+------+
1 row in set (0.00 sec)
[22 Sep 2003 5:45] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Simon, thanks for good bug report!