Bug #95347 REGEXP negative match on ], like [^][], works in 5.7, not in 8.0
Submitted: 13 May 2019 9:00 Modified: 23 Jul 2020 5:14
Reporter: Daniel Black Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:8.0.16, 8.0.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: regression

[13 May 2019 9:00] Daniel Black
Description:
From https://stackoverflow.com/questions/56105462/how-to-negate-square-brackets-in-mysql-regex/...:

An regexp [^][] validates correctly in MySQL-5.7 but not MySQL-8.0.

While technically the docs https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp, say that ] must exist as the first character after the initial [, its a reasonable to assume that [^]] is also valid (and indeed it is in 5.7).

How to repeat:

CREATE TABLE vars (varname VARCHAR(20));
INSERT INTO vars VALUES
('hello world'),
('hello [world'),
('hello world]'),
('hello [world]');

SELECT varname, varname RLIKE '^[^][]+$' AS `match`
FROM vars

5.7.22:

varname         match
hello world     1
hello [world    0
hello world]    0
hello [world]   0

8.0.12

Error: UNKNOWN_CODE_PLEASE_REPORT: The regular expression contains an unclosed bracket expression.
[13 May 2019 9:20] MySQL Verification Team
Hello Daniel,

Thank you for the report and test case.
Verified as described.

regards,
Umesh
[13 May 2019 9:21] MySQL Verification Team
- 5.7.26

 bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.26 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> CREATE TABLE vars (varname VARCHAR(20));
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO vars VALUES
    -> ('hello world'),
    -> ('hello [world'),
    -> ('hello world]'),
    -> ('hello [world]');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql>
mysql> SELECT varname, varname RLIKE '^[^][]+$' AS `match`
    -> FROM vars
    -> ;
+---------------+-------+
| varname       | match |
+---------------+-------+
| hello world   |     1 |
| hello [world  |     0 |
| hello world]  |     0 |
| hello [world] |     0 |
+---------------+-------+
4 rows in set (0.00 sec)

- 8.0.16
 bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.16 MySQL Community Server - GPL

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> CREATE TABLE vars (varname VARCHAR(20));
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO vars VALUES
    -> ('hello world'),
    -> ('hello [world'),
    -> ('hello world]'),
    -> ('hello [world]');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT varname, varname RLIKE '^[^][]+$' AS `match`
    -> FROM vars;
ERROR 3696 (HY000): The regular expression contains an unclosed bracket expression.
mysql> \q
Bye
[13 May 2019 9:23] MySQL Verification Team
- lowest version checked in 8.0

 bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 MySQL Community Server - GPL

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.20 sec)

mysql> use test
Database changed
mysql>
mysql> CREATE TABLE vars (varname VARCHAR(20));
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO vars VALUES
    -> ('hello world'),
    -> ('hello [world'),
    -> ('hello world]'),
    -> ('hello [world]');
Query OK, 4 rows affected (0.01 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql>
mysql> SELECT varname, varname RLIKE '^[^][]+$' AS `match`
    -> FROM vars;
ERROR 3696 (HY000): The regular expression contains an unclosed bracket expression.
mysql>
[28 May 2019 7:46] MySQL Verification Team
*** MHANSSON  martin.hansson Wed May 15 2019 20:10:56 GMT+0530 (IST)***

Can you please elaborate on what you perceive as the problem? First, the
syntax [^][] is mentioned, which is not valid and is documented as such.
Later, the example of [^]] is brought up, which is valid and documented as
valid. The "how to repeat" boils down to the former syntax:

mysql> SELECT varname, varname RLIKE '[^][]' AS `match` FROM vars;
ERROR 3696 (HY000): The regular expression contains an unclosed bracket
expression

Either way, the problem is easily circumvented with a backslash escape.

Which is the actual issue?
[28 May 2019 12:20] Daniel Black
Quite right, the only issue is if '[' is valid in a character set.

The docs for https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp under the ' [a-dX], [^a-dX] ' paragraphs heading (same for 5.7) doesn't indicate that '[[]' is invalid. Isn't every character(like '[') apart from the terminating ']' valid? Not sure which criteria is being used to invalidate it.

So previously in 5.7 the following worked.

select '[' RLIKE '[[]'
1

8.0 requires escapes like

select '[' RLIKE '[\\[]'
1

The 8.0 syntax parses in 5.7 however

select '\\' RLIKE '[^]\\[]'

will return 0 in 5.7 and 1 in 8.0

So 2 different behaviours has been introduced.
[28 May 2019 12:54] MySQL Verification Team
Thank you, Daniel.
I'll forward your responses to Dev.

regards,
Umesh
[29 Jul 2019 9:57] Martin Hansson
Hi Daniel,
between 5.8 and 8.0, we introduced a new regular expression library so that we could support new functions (https://mysqlserverteam.com/new-regular-expression-functions-in-mysql-8-0/). This library obviously has a different implementation, so it will behave differently in some corner cases. Formally, I suppose you are always required to escape meta-characters in a regular expression syntax, but some libraries are more liberal in the cases when it is clear what the user means. These little quirks are rarely documented in the libraries' documentation.

As for the second example, select '\\' RLIKE '[^]\\[]', the result in 8.0 is more reasonable, IMHO:

The library searches a string consisting of "\" (since the mysql parser "eats" one backslash). Likewise, the regexp library sees the regexp "[^]\[]", meaning "match either something that is not a closing bracket or an opening bracket", (since it "eats" the remaining backslash.) Backslash is not a closing bracket, so it matches.
[25 Apr 2021 15:56] Rick James
This is one of many cases where 8.0 REGEXPs are incompatible with 5.x.  "Word boundary" is another, possibly more common, incompatibility.

For some syntaxes, it seems like it might be possible to 'fix' the syntax on the fly?