Bug #80043 No double escapes required inside square brackets
Submitted: 19 Jan 2016 6:19 Modified: 25 Oct 2018 8:46
Reporter: Pavel Taruts Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:14.14 Distrib 5.6.27, for Linux (x86_64) OS:Ubuntu (Linux Mint 17 Qiana (Ubuntu 14.04 LTS))
Assigned to: CPU Architecture:Any
Tags: REGEXP

[19 Jan 2016 6:19] Pavel Taruts
Description:
Docs say we should always escape \ in regexp, but inside [] we should not. 
What's more, if we do escape \ inside [], the results may be not as they should be, according to the docs.

Either there's a bug in the docs, or in the semantics

How to repeat:
-- Returns 1, like it should, according to the docs
select '\\' regexp '\\-'
-- Returns 1, like it should, according to the docs
select '-'  regexp '\\-'
-- Returns 0, NOT OK
select '\\' regexp '[\\-]'
-- Returns 1, ok
select '-'  regexp '[\\-]'

Suggested fix:
Either there's a bug in the docs, or in the semantics
[29 Sep 2018 4:20] MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ;
+-------------------+
| '\\' regexp '\\-' |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '-'  regexp '\\-' ;
+-------------------+
| '-'  regexp '\\-' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK
mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ;
+---------------------+
| '\\' regexp '[\\-]' |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, ok
mysql [localhost] {msandbox} (test) > select '-'  regexp '[\\-]' ;
+---------------------+
| '-'  regexp '[\\-]' |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @@version
    -> ;
+-----------+
| @@version |
+-----------+
| 5.7.15    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:21] MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ;
+-------------------+
| '\\' regexp '\\-' |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '-'  regexp '\\-' ;
+-------------------+
| '-'  regexp '\\-' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK
mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ;
+---------------------+
| '\\' regexp '[\\-]' |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, ok
mysql [localhost] {msandbox} (test) > select '-'  regexp '[\\-]' ;
+---------------------+
| '-'  regexp '[\\-]' |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @@version
    -> ;
+-----------+
| @@version |
+-----------+
| 5.6.27    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:33] MySQL Verification Team
Hi,

I'm getting different results from you? Are you sure you are using Oracle built binaries?

regards
Bogdan
[29 Sep 2018 4:44] MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ;
+-------------------+
| '\\' regexp '\\-' |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs
mysql [localhost] {msandbox} (test) > select '-'  regexp '\\-' ;
+-------------------+
| '-'  regexp '\\-' |
+-------------------+
|                 1 |
+-------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK
mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ;
+---------------------+
| '\\' regexp '[\\-]' |
+---------------------+
|                   0 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > -- Returns 1, ok
mysql [localhost] {msandbox} (test) > select '-'  regexp '[\\-]' ;
+---------------------+
| '-'  regexp '[\\-]' |
+---------------------+
|                   1 |
+---------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.11    |
+-----------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:45] MySQL Verification Team
Hi,

V8 is behaving differently, I'm verifying this. Thanks for the report.

all best
Bogdan
[25 Oct 2018 8:46] Erlend Dahl
Posted by developer - Martin Hansson

The rules regarding escaping in regular expression searches are not trivial.
The biggest source of confusion is that there are two sites where escapes are
interpreted. The first one is MySQL itself, which has some ideas of its own
how to interpret them. After that, the regular expression string is
interpreted by the ICU library. In the intersection of these rules, we see
some funny effects.

Let us first be clear that the query

SELECT '\\' REGEXP '\\-'

returns 0, which is expected. The search string is interpreted by MySQL as \,
while the regular expression is interpreted as \-, which is then sent to ICU
to be compiled. Now, for some reason the \- is interpreted as -, probably
because - has special meaning within [] brackets, and the same behavior is
probably reused when it sits on its own. Besides, \- has no special meaning,
as opposed to \a or \b which match the bell character and a word boundary,
respectively. That's probably why we see these effects:

SELECT 'a' REGEXP '\\a';
0
SELECT 'b' REGEXP '\\b';
0
SELECT 'b' REGEXP '\\b';
1

And then we have MySQL's escape rules. Any character can be escaped that
doesn't have special meaning.
https://dev.mysql.com/doc/refman/8.0/en/string-literals.html. And any number
of backslashes can be stripped off. Hence:

SELECT '\0' = '0'; -- \0 is the null terminator character
0
SELECT '\a' = 'a';
1
SELECT '\\a' = 'a';
1
SELECT '\\\a' = 'a';
1

...and so on. \a has no special meaning in MySQL (currently). Note how this
contrasts with ICU's escape rules:

SELECT '\x' regexp '\x'; -- No worries, MySQL strips off the backslashes,
comparing x to x.
1
SELECT '\\x' regexp '\x'; -- MySQL strips off all the backslashes, comparing
x to x.
1
mysql> SELECT '\\x' regexp '\\x'; -- ICU does not.
ERROR 3689 (HY000): Unrecognized escape sequence in regular expression.

Now we're ready to tackle the original issue in the bug report:

SELECT '\\' regexp '[\\-]'; -- This means compare backslash to the literal
character - ...
0
SELECT '-' regexp '[\\-]'; -- ...as proved by this being successful
1
SELECT '\\' regexp '[\\\-]'; -- Fails, we compare a backslash to \-, which
ICU interprets as -.
0
SELECT '\\' regexp '[\\\\-]'; -- Now we have enough backslashes to tell ICU
to mach either \ or -.
1