Bug #46659 Using Percent and Backslashes in LIKE Queries is buggy
Submitted: 11 Aug 2009 16:23 Modified: 12 Aug 2009 12:07
Reporter: Christian Hammers (Silver Quality Contributor) (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.0, 5.1 OS:Linux (Debian 5.0 lenny, Ubuntu 9.04 amd64)
Assigned to: Assigned Account CPU Architecture:Any
Tags: qc

[11 Aug 2009 16:23] Christian Hammers
Description:
I cannot figure out how to select all rows that contain a backslash in a varchar field. The behavious seems buggy to me. Or at least not to follow the described behaviour on http://dev.mysql.com/doc/refman/5.1/en/string-comparison-functions.html which says that a backslash has to be escaped twice when in a "%..%" LIKE statement.

Why does the     SELECT * FROM t WHERE info like '%\\\\%';    query find a row with an Umlaut?

How to repeat:
mysql> CREATE TABLE t (info varchar(255) DEFAULT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO t VALUES ('März'), ('C:\\Windows'), ('foo');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t WHERE info like '%\\\\%';
+------------+
| info       |
+------------+
| März       | 
| C:\Windows | 
+------------+
2 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE "%char%";
+--------------------------+------------------------------------------------------------+
| Variable_name            | Value                                                      |
+--------------------------+------------------------------------------------------------+
| character_set_client     | latin1                                                     | 
| character_set_connection | latin1                                                     | 
| character_set_database   | latin1                                                     | 
| character_set_filesystem | binary                                                     | 
| character_set_results    | latin1                                                     | 
| character_set_server     | latin1                                                     | 
| character_set_system     | utf8                                                       | 
| character_sets_dir       | /usr/local/mysql-5.1.36-linux-i686-glibc23/share/charsets/ | 
+--------------------------+------------------------------------------------------------+
8 rows in set (0.00 sec)

Suggested fix:
Let LIKE '%\\\\%'  only match fields with a backslash character in them.
[11 Aug 2009 16:55] MySQL Verification Team
I couldn't repeat on Windows with source tree server:

c:\dbs>c:\dbs\5.1\bin\mysql -uroot --port=3510 --prompt="mysql 5.1 >"
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.1.38-Win X64 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql 5.1 >use test
Database changed
mysql 5.1 >CREATE TABLE t (info varchar(255) DEFAULT NULL) ENGINE=MyISAM DEFAULT
    -> CHARSET=latin1;
Query OK, 0 rows affected (0.17 sec)

mysql 5.1 >INSERT INTO t VALUES ('März'), ('C:\\Windows'), ('foo');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql 5.1 >SELECT * FROM t WHERE info like '%\\\\%';
+------------+
| info       |
+------------+
| C:\Windows |
+------------+
1 row in set (0.00 sec)

mysql 5.1 >SHOW VARIABLES LIKE "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | c:\dbs\5.1\share\charsets\ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql 5.1 >
[12 Aug 2009 5:17] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior too. Please provide your configuration file and indicate accurate name of package you are using (file name or configure options if you built MySQL yourself).
[12 Aug 2009 8:58] Christian Hammers
Hello

I can reproduce the bug by cut&paste the commands that Miguel tried, too!

My result ist still:

mysql> SELECT * FROM t WHERE info like '%\\\\%';
+------------+
| info       |
+------------+
| März       | 
| C:\Windows | 
+------------+
2 rows in set (0.00 sec)

My locale settings:

$ locale
LANG=de_DE@euro
LC_CTYPE="de_DE.ISO-8859-15@euro"
LC_NUMERIC="de_DE.ISO-8859-15@euro"
LC_TIME="de_DE.ISO-8859-15@euro"
LC_COLLATE="de_DE.ISO-8859-15@euro"
LC_MONETARY="de_DE.ISO-8859-15@euro"
LC_MESSAGES="de_DE.ISO-8859-15@euro"
LC_PAPER="de_DE.ISO-8859-15@euro"
LC_NAME="de_DE.ISO-8859-15@euro"
LC_ADDRESS="de_DE.ISO-8859-15@euro"
LC_TELEPHONE="de_DE.ISO-8859-15@euro"
LC_MEASUREMENT="de_DE.ISO-8859-15@euro"
LC_IDENTIFICATION="de_DE.ISO-8859-15@euro"
LC_ALL=de_DE.ISO-8859-15@euro

I'm using mysql-5.1.36-linux-i686-glibc2 on a Debian 5.0 "lenny" system with the following my.cnf:

[client]
port            = 3306
socket          = /tmp/mysql.sock

[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size = 256M
thread_concurrency = 4
tmp_table_size          = 128M
max_heap_table_size     = 64M

datadir = /usr/local/mysql3306/data

log-bin             = /usr/local/mysql3306/log/dbpriv2-bin
relay-log           = /usr/local/mysql3306/log/dbpriv2-relay-bin
slow_query_log_file = /usr/local/mysql3306/log/dbpriv2-slow
slow_query_log
long_query_time     = 1
max_connections     = 1024
expire_logs_days    = 31

server-id       = 2
replicate-ignore-db = mysql
log-slave-updates

skip-innodb

[mysqldump]
quick
max_allowed_packet = 32M

[mysql]

[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout
[12 Aug 2009 12:07] Susanne Ebrecht
I am able to repeat it.

Test:

Use an utf8 KDE/Gnome Terminal

Open MySQL CLI

SET NAMES UTF8;

CREATE DATABASE bug46659 CHARACTER SET latin1;

USE bug46659;

CREATE TABLE t(info VARCHAR(255) DEFAULT NULL) ENGINE=myisam CHARSET latin1;

INSERT INTO t VALUES('März'),('C:\\Windows'),('foo'),('\\\\');

SELECT * FROM t WHERE info LIKE '%\\\\%';
+------------+
| info       |
+------------+
| März      | 
| C:\Windows | 
| \\         | 
+------------+
3 rows in set (0.00 sec)

Why is 'März' and 'C:\Windows' listed here.

LIKE '%\\\\%' should only match when I have two backslashes in a row. And not for for single backslash.

Hex code is 5c5c

select HEX(info) FROM t;
+----------------------+
| hex(info)            |
+----------------------+
| 4DE4727A             | 
| 433A5C57696E646F7773 | 
| 666F6F               | 
| 5C5C                 | 
+----------------------+

433A5C57 == C:\W and not C:\\W ... so LIKE '%\\\\%' not should find this.

Same behaviour with ISO-8859-15 and LATIN1 environment.

****

Same test by using default database and table charset utf8:

SELECT * FROM t WHERE info LIKE '%\\\\%';
+------------+
| info       |
+------------+
| C:\Windows | 
| \\         | 
+------------+

März is not found anymore but still single backslash is found.

I used utf8 database/table for further tests:

INSERT INTO t VALUES('\\');

SELECT * FROM t WHERE info LIKE '%\\\\%';
+------------+
| info       |
+------------+
| C:\Windows | 
| \\         | 
| \          | 
+------------+

SELECT * FROM t WHERE info LIKE '\\\\%';
+------+
| info |
+------+
| \\   | 
| \    | 
+------+

SELECT * FROM t WHERE info LIKE '\\\\';
+------+
| info |
+------+
| \    | 
+------+

SELECT * FROM t WHERE info = '\\\\';
+------+
| info |
+------+
| \\   | 
+------+

It seems LIKE is doing double escaping here.

When I use latin1 database/table again ....

With content März, C:\Windows and \\

(INSERT INTO t VALUES('März'),('C:\\Windows'),('foo'),('\\\\');)

then:

select * from t where info like '%\\\\\\\\%';
+------+
| info |
+------+
| \\   | 
+------+

Why do I need 8 backslashes to find double backslash?
[12 Aug 2009 12:13] Susanne Ebrecht
Same bug in 5.0
[12 Aug 2009 12:14] Peter Laursen
This statement is wrong "LIKE '%\\\\%' should only match when I have two backslashes in a row. And not for for single backslash."

So result here should be

+------------+
| info       |
+------------+
| C:\Windows |
| \\         |
+------------+

refer to:
http://dev.mysql.com/doc/refman/5.1/en/string-comparison-functions.html#operator_like

"Note

Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against. (Exception: At the end of the pattern string, backslash can be specified as “\\”. At the end of the string, backslash stands for itself because there is nothing following to escape.)"

(this was added to manual around 2 years ago on my request actually. I confused this, my colleagues did and our users did too!)
[15 Aug 2009 10:09] Sergei Golubchik
I suppose it's intentional.

According to
http://www.collation-charts.org/mysql60/mysql604.latin1_swedish_ci.html
in latin1_swedish_ci '\' == 'ä'
[15 Aug 2009 11:33] Peter Laursen
"in latin1_swedish_ci '\' == 'ä'"

well .. sometimes collations have surprises included! How can '\' == 'ä' in Swedish? If this is specific for Swedish collation, I think the collation is buugy.  And pretty serious then as Swedish collation is latin1 default.
[19 Oct 2009 8:04] Bernt Marius Johnsen
In Latin1 swedish 'ä' and '\' have different values. In ISO-646-SE (which is a 7-bit charset) 'ä' has the same value as in 7-bit ASCII '{'. It is 'ö' that has the same value as '\'. But then again, don't confuse latin1 (aka ISO-8859-1) with ISO-646-SE.
[24 Dec 2009 7:35] Alexander Barkov
http://forge.mysql.com/worklog/task.php?id=5170