Bug #104387 CHARACTER_SET_MISMATCH issue with regex comparisons
Submitted: 22 Jul 2021 21:24 Modified: 18 Aug 2021 0:29
Reporter: Leonardo Fernandes Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.22, 8.0.23, 8.0.24, 8.0.25, 8.0.26 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any
Tags: regression

[22 Jul 2021 21:24] Leonardo Fernandes
Description:
The query below works on 8.0.21, but throws a CHARACTER_SET_MISMATCH error on 8.0.22+ when using regex comparisson:

mysql > create table colltest ( id int unsigned not null primary key auto_increment, cbin varbinary(140), ctxt text character set utf8mb3 collate utf8mb3_bin, cstr varchar(255) character set utf8mb4);
Query OK, 0 rows affected, 2 warnings (0.45 sec)

mysql > insert into colltest set cbin='asdf',ctxt='asdf',cstr='asdf';
Query OK, 1 row affected (0.01 sec)

mysql > select * from colltest where cbin rlike cstr;
ERROR 3995 (HY000): Character set 'binary' cannot be used in conjunction with 'utf8mb4_0900_ai_ci' in call to regexp_like.

mysql > select * from colltest where cbin rlike ctxt;
ERROR 3995 (HY000): Character set 'binary' cannot be used in conjunction with 'utf8_bin' in call to regexp_like.

The same query works normally on MySQL 8.0.21:
mysql > create table colltest ( id int unsigned not null primary key auto_increment, cbin varbinary(140), ctxt text character set utf8mb3 collate utf8mb3_bin, cstr varchar(255) character set utf8mb4);
Query OK, 0 rows affected, 2 warnings (0.07 sec)

mysql >  insert into colltest set cbin='asdf',ctxt='asdf',cstr='asdf';
Query OK, 1 row affected (0.65 sec)

mysql > select * from colltest where cbin rlike cstr;
+----+------------+------+------+
| id | cbin       | ctxt | cstr |
+----+------------+------+------+
|  1 | 0x61736466 | asdf | asdf |
+----+------------+------+------+
1 row in set (0.00 sec)

mysql > select * from colltest where cbin rlike ctxt;
+----+------------+------+------+
| id | cbin       | ctxt | cstr |
+----+------------+------+------+
|  1 | 0x61736466 | asdf | asdf |
+----+------------+------+------+
1 row in set (0.00 sec)

How to repeat:
create table colltest ( id int unsigned not null primary key auto_increment, cbin varbinary(140), ctxt text character set utf8mb3 collate utf8mb3_bin, cstr varchar(255) character set utf8mb4);

insert into colltest set cbin='asdf',ctxt='asdf',cstr='asdf';

select * from colltest where cbin rlike cstr;

select * from colltest where cbin rlike ctxt;

These work on 8.0.21, but throw an error on 8.0.25+
[23 Jul 2021 6:12] MySQL Verification Team
Hello Leonardo,

Thank you for the report and feedback.
Observed that 8.0.26 is affected(no issues observed in 5.7.35).

-- 
- 8.0.26

bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.26 MySQL Community Server - GPL

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.01 sec)

mysql> use test
Database changed
mysql> create table colltest ( id int unsigned not null primary key auto_increment, cbin varbinary(140), ctxt text character set utf8mb3 collate utf8mb3_bin, cstr varchar(255) character set utf8mb4);
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql>
mysql> insert into colltest set cbin='asdf',ctxt='asdf',cstr='asdf';
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> select * from colltest where cbin rlike cstr;
ERROR 3995 (HY000): Character set 'binary' cannot be used in conjunction with 'utf8mb4_0900_ai_ci' in call to regexp_like.
mysql>
mysql> select * from colltest where cbin rlike ctxt;
ERROR 3995 (HY000): Character set 'binary' cannot be used in conjunction with 'utf8_bin' in call to regexp_like.
mysql>
mysql> show errors;
+-------+------+----------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                      |
+-------+------+----------------------------------------------------------------------------------------------+
| Error | 3995 | Character set 'binary' cannot be used in conjunction with 'utf8_bin' in call to regexp_like. |
+-------+------+----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                      |
+-------+------+----------------------------------------------------------------------------------------------+
| Error | 3995 | Character set 'binary' cannot be used in conjunction with 'utf8_bin' in call to regexp_like. |
+-------+------+----------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>

-- 5.7.35

 bin/mysql -uroot -S /tmp/mysql_ushastry.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.35 MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test
Database changed
mysql> create table colltest ( id int unsigned not null primary key auto_increment, cbin varbinary(140), ctxt text character set utf8mb3 collate utf8mb3_bin, cstr varchar(255) character set utf8mb4);
Query OK, 0 rows affected (0.02 sec)

mysql> insert into colltest set cbin='asdf',ctxt='asdf',cstr='asdf';
Query OK, 1 row affected (0.00 sec)

mysql> select * from colltest where cbin rlike cstr;
+----+------+------+------+
| id | cbin | ctxt | cstr |
+----+------+------+------+
|  1 | asdf | asdf | asdf |
+----+------+------+------+
1 row in set (0.00 sec)

mysql> select * from colltest where cbin rlike ctxt;
+----+------+------+------+
| id | cbin | ctxt | cstr |
+----+------+------+------+
|  1 | asdf | asdf | asdf |
+----+------+------+------+
1 row in set (0.00 sec)

From 8.0.22 change log this seems to be intentional change but verifying since those who upgrade from 5.7.35(or earlier) to 8.0 GA(or <8.0.22) are surely affected from this change.

- Quoting from 8.0.22 change log - https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-22.html

"Regular expression functions such as REGEXP_LIKE() yielded inconsistent results with binary string arguments. These functions now reject binary strings with an error. (Bug #31031886, Bug #98951, Bug #31031888, Bug #98950)"

regards,
Umesh
[26 Jul 2021 15:12] Leonardo Fernandes
Thank you, Umesh, 

I hadn't noticed it was a documented change:
"Regular expression functions such as REGEXP_LIKE() yielded inconsistent results with binary string arguments. These functions now reject binary strings with an error. (Bug #31031886, Bug #98951, Bug #31031888, Bug #98950)"
[27 Jul 2021 19:51] Trey Raymond
fun fact: one of the documented examples (casting to binary) doesn't work now due to this
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp

mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
ERROR 3995 (HY000): Character set 'utf8mb4_0900_ai_ci' cannot be used in conjunction with 'binary' in call to regexp_like.
[10 Aug 2021 17:39] Jon Stephens
See also BUG#103223.
[18 Aug 2021 0:29] Jon Stephens
Fixed in mysqldoc rev 70644. Closed.