Bug #80043 | No double escapes required inside square brackets | ||
---|---|---|---|
Submitted: | 19 Jan 2016 6:19 | Modified: | 25 Oct 2018 8:46 |
Reporter: | Pavel Taruts | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | 14.14 Distrib 5.6.27, for Linux (x86_64) | OS: | Ubuntu (Linux Mint 17 Qiana (Ubuntu 14.04 LTS)) |
Assigned to: | CPU Architecture: | Any | |
Tags: | REGEXP |
[19 Jan 2016 6:19]
Pavel Taruts
[29 Sep 2018 4:20]
MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ; +-------------------+ | '\\' regexp '\\-' | +-------------------+ | 0 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '-' regexp '\\-' ; +-------------------+ | '-' regexp '\\-' | +-------------------+ | 1 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ; +---------------------+ | '\\' regexp '[\\-]' | +---------------------+ | 1 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, ok mysql [localhost] {msandbox} (test) > select '-' regexp '[\\-]' ; +---------------------+ | '-' regexp '[\\-]' | +---------------------+ | 1 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > select @@version -> ; +-----------+ | @@version | +-----------+ | 5.7.15 | +-----------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:21]
MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ; +-------------------+ | '\\' regexp '\\-' | +-------------------+ | 0 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '-' regexp '\\-' ; +-------------------+ | '-' regexp '\\-' | +-------------------+ | 1 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ; +---------------------+ | '\\' regexp '[\\-]' | +---------------------+ | 1 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, ok mysql [localhost] {msandbox} (test) > select '-' regexp '[\\-]' ; +---------------------+ | '-' regexp '[\\-]' | +---------------------+ | 1 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > select @@version -> ; +-----------+ | @@version | +-----------+ | 5.6.27 | +-----------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:33]
MySQL Verification Team
Hi, I'm getting different results from you? Are you sure you are using Oracle built binaries? regards Bogdan
[29 Sep 2018 4:44]
MySQL Verification Team
mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '\\' regexp '\\-' ; +-------------------+ | '\\' regexp '\\-' | +-------------------+ | 0 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, like it should, according to the docs mysql [localhost] {msandbox} (test) > select '-' regexp '\\-' ; +-------------------+ | '-' regexp '\\-' | +-------------------+ | 1 | +-------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 0, NOT OK mysql [localhost] {msandbox} (test) > select '\\' regexp '[\\-]' ; +---------------------+ | '\\' regexp '[\\-]' | +---------------------+ | 0 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > -- Returns 1, ok mysql [localhost] {msandbox} (test) > select '-' regexp '[\\-]' ; +---------------------+ | '-' regexp '[\\-]' | +---------------------+ | 1 | +---------------------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) > select @@version; +-----------+ | @@version | +-----------+ | 8.0.11 | +-----------+ 1 row in set (0.00 sec) mysql [localhost] {msandbox} (test) >
[29 Sep 2018 4:45]
MySQL Verification Team
Hi, V8 is behaving differently, I'm verifying this. Thanks for the report. all best Bogdan
[25 Oct 2018 8:46]
Erlend Dahl
Posted by developer - Martin Hansson The rules regarding escaping in regular expression searches are not trivial. The biggest source of confusion is that there are two sites where escapes are interpreted. The first one is MySQL itself, which has some ideas of its own how to interpret them. After that, the regular expression string is interpreted by the ICU library. In the intersection of these rules, we see some funny effects. Let us first be clear that the query SELECT '\\' REGEXP '\\-' returns 0, which is expected. The search string is interpreted by MySQL as \, while the regular expression is interpreted as \-, which is then sent to ICU to be compiled. Now, for some reason the \- is interpreted as -, probably because - has special meaning within [] brackets, and the same behavior is probably reused when it sits on its own. Besides, \- has no special meaning, as opposed to \a or \b which match the bell character and a word boundary, respectively. That's probably why we see these effects: SELECT 'a' REGEXP '\\a'; 0 SELECT 'b' REGEXP '\\b'; 0 SELECT 'b' REGEXP '\\b'; 1 And then we have MySQL's escape rules. Any character can be escaped that doesn't have special meaning. https://dev.mysql.com/doc/refman/8.0/en/string-literals.html. And any number of backslashes can be stripped off. Hence: SELECT '\0' = '0'; -- \0 is the null terminator character 0 SELECT '\a' = 'a'; 1 SELECT '\\a' = 'a'; 1 SELECT '\\\a' = 'a'; 1 ...and so on. \a has no special meaning in MySQL (currently). Note how this contrasts with ICU's escape rules: SELECT '\x' regexp '\x'; -- No worries, MySQL strips off the backslashes, comparing x to x. 1 SELECT '\\x' regexp '\x'; -- MySQL strips off all the backslashes, comparing x to x. 1 mysql> SELECT '\\x' regexp '\\x'; -- ICU does not. ERROR 3689 (HY000): Unrecognized escape sequence in regular expression. Now we're ready to tackle the original issue in the bug report: SELECT '\\' regexp '[\\-]'; -- This means compare backslash to the literal character - ... 0 SELECT '-' regexp '[\\-]'; -- ...as proved by this being successful 1 SELECT '\\' regexp '[\\\-]'; -- Fails, we compare a backslash to \-, which ICU interprets as -. 0 SELECT '\\' regexp '[\\\\-]'; -- Now we have enough backslashes to tell ICU to mach either \ or -. 1