Bug #63829 Search containig backslash doesn't work if field collation is utf8_unicode_ci
Submitted: 21 Dec 2011 19:34 Modified: 22 Dec 2011 13:59
Reporter: Kevin Qiu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S3 (Non-critical)
Version:5.1.58, 5.5.20 OS:Any
Assigned to: CPU Architecture:Any
Tags: collation, like, SQL, Unicode, utf8

[21 Dec 2011 19:34] Kevin Qiu
Description:
A SQL statement with LIKE %<pattern>%, will return rows containing <pattern>, however, when the field to search has collation utf8_unicode_ci and the pattern starts with the escaping character (properly escaped), the search won't match anything.

Please see the steps to reproduce.

How to repeat:
1. In any database, create a table:
mysql> CREATE TABLE FOO(name VARCHAR(255) COLLATE utf8_unicode_ci) CHARACTER SET utf8;
Query OK, 0 rows affected (0.00 sec)

2. Insert a few rows:
mysql> INSERT INTO FOO(name) VALUES('\\slash'), ('sla\\sh'), ('slash\\'), ('|bar'), ('ba|r'), ('bar|');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM FOO;
+--------+
| name   |
+--------+
| \slash |
| sla\sh |
| slash\ |
| |bar   |
| ba|r   |
| bar|   |
+--------+
6 rows in set (0.00 sec)

3. Search for rows containing a slash:
EXPECTED: rows with slashes are returned
ACTUAL: No rows are returned:

mysql> SELECT * FROM FOO WHERE name LIKE '%\\\\%';
Empty set (0.00 sec)

4. Similarly, if I specify the escape character to be '|', it won't work if I search for literal '|':
mysql> SELECT * FROM FOO WHERE name LIKE '%||%' ESCAPE '|';
Empty set (0.00 sec)
EXPECTED: rows with vertical bar returned
ACTUAL: No rows are returned.

The same setup works if I change the field collation to its default (utf8_general_ci):

mysql> ALTER TABLE FOO MODIFY COLUMN name VARCHAR(255) COLLATE utf8_general_ci;
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM FOO WHERE name LIKE '%||%' ESCAPE '|';
+------+
| name |
+------+
| |bar |
| ba|r |
| bar| |
+------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM FOO WHERE name LIKE '%\\\\%';
+--------+
| name   |
+--------+
| \slash |
| sla\sh |
| slash\ |
+--------+
3 rows in set (0.00 sec)
[21 Dec 2011 19:45] Valeriy Kravchuk
Thank you for the problem report. Verified with 5.5.20 on Mac OS X also:

macbook-pro:5.5 openxs$ bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.5.20-debug Source distribution

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE TABLE FOO(name VARCHAR(255) COLLATE utf8_unicode_ci)
    -> CHARACTER SET utf8;
Query OK, 0 rows affected (0.26 sec)

mysql> INSERT INTO FOO(name) VALUES('\\slash'), ('sla\\sh'),
    -> ('slash\\'), ('|bar'), ('ba|r'), ('bar|');
Query OK, 6 rows affected (0.19 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM FOO;
+--------+
| name   |
+--------+
| \slash |
| sla\sh |
| slash\ |
| |bar   |
| ba|r   |
| bar|   |
+--------+
6 rows in set (0.00 sec)

mysql> SELECT * FROM FOO WHERE name LIKE '%\\\\%';
Empty set (0.04 sec)

mysql> SELECT * FROM FOO WHERE name LIKE '%||%' ESCAPE '|';
Empty set (0.00 sec)

mysql> ALTER TABLE FOO MODIFY COLUMN name VARCHAR(255) COLLATE
    -> utf8_general_ci;
Query OK, 6 rows affected (0.27 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM FOO WHERE name LIKE '%\\\\%';
+--------+
| name   |
+--------+
| \slash |
| sla\sh |
| slash\ |
+--------+
3 rows in set (0.00 sec)

mysql> SELECT * FROM FOO WHERE name LIKE '%||%' ESCAPE '|';
+------+
| name |
+------+
| |bar |
| ba|r |
| bar| |
+------+
3 rows in set (0.00 sec)
[22 Dec 2011 8:28] Peter Laursen
I think that worked earlier (in 4.1/5.0 ??).

5 years ago I had an application that stored Windows filepaths in a database and I had no problem matching those. But I am not perfectly sure that I used "LIKE* operator in the application.  Maybe only "=" operator was used.
[22 Dec 2011 9:05] Peter Laursen
SELECT 'backs\ash' LIKE 'backs\\ash'; -- 1
SELECT 'backs\ash' LIKE '%\\%'; -- 0

SELECT 'backs\ash' LIKE 'backs\\\\ash'; -- 0
SELECT 'backs\ash' LIKE '%\\\\%'; -- 0

(same in 5.0.90, 5.1.60 and 5.5.19)

In my understanding the upper two should return "0" and lower two should return "1"
[22 Dec 2011 13:59] Kevin Qiu
> SELECT 'backs\ash' LIKE 'backs\\\\ash'; -- 0
> SELECT 'backs\ash' LIKE '%\\\\%'; -- 0

> (same in 5.0.90, 5.1.60 and 5.5.19)

> In my understanding the upper two should return "0" and lower two should return > "1"

Shouldn't it be SELECT 'backs\\lash' LIKE '%\\\\%'; ? Anyways, this should still work but it's a problem when the field collation is utf8_unicode_ci.
[27 Apr 2020 17:55] Robert Simpson
We encountered this issue trying to find non-domain Active Directory accounts (accounts without "DOMAIN\").

The table is created with "ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci".

This statement does _not_ exclude values with a backslash:

[code]
select * from data WHERE value not like _utf8 '%\\\\%' collate utf8_unicode_ci escape _utf8 '\\' collate utf8_unicode_ci;
[/code]

This statement does exclude values with backslashes:

[code]
select * from data WHERE value collate utf8_general_ci not like _utf8 '%\\\\%' collate utf8_general_ci escape _utf8 '\\' collate utf8_general_ci;
[/code]
[19 Apr 21:56] Michal Vorisek
probably duplicate of bug #84118