Bug #93479 REGEXP_REPLACE incorrect results: spuriously concatenates data across rows
Submitted: 4 Dec 2018 23:53 Modified: 5 Dec 2018 20:46
Reporter: Daniel Posch Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:8.0.11 OS:Linux (Amazon RDS)
Assigned to: CPU Architecture:x86 (x86_64)

[4 Dec 2018 23:53] Daniel Posch
Description:
REGEXP_REPLACE appears to be prepending data from **other rows in the result set** into each computed value.

In certain situations, this could be a security vulnerability. Correctly written applications could return sensitive information to the wrong parties, since the database is combining unrelated data across rows in an invalid way.

How to repeat:
Minimal steps to reproduce:

create table tmp(wat varchar(100));
insert into tmp(wat) values ('Jim Bob'), ('John Bobbs'), ('Tim Bub');
select wat, regexp_replace(wat, '[^ ]* ', '') as lastName from tmp;

Expected result:

wat, lastName
Jim Bob, Bob
John Bobbs, Bobbs
Tim Bub, Bub

Observed result:

wat, lastName
Jim Bob, Bob
John Bobbs, BobBobbs
Tim Bub, BobBobbsBub

Data from OTHER ROWS is concatenated into the result on the current row. (!)

I suspect that there may be either:
- Some state in the regex engine that is not being cleared properly between rows
- A buggy interaction between REGEXP_REPLACE and window functionality (both new in MySQL 8)
[5 Dec 2018 4:58] MySQL Verification Team
Hello Daniel,

Thank you for the report and test case.
Observed this with 8.0.11 build but it is no longer reproducible in latest 8.0.13 build. This is most likely fixed after Bug #90803, which is fixed in 8.0.12. We don't fix bugs in old versions, don't backport bug fixes. So, please, upgrade and inform us if problem still exists.

-- 8.0.11 - affected
mysql> create table tmp(wat varchar(100));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into tmp(wat) values ('Jim Bob'), ('John Bobbs'), ('Tim Bub');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select wat, regexp_replace(wat, '[^ ]* ', '') as lastName from tmp;
+------------+-------------+
| wat        | lastName    |
+------------+-------------+
| Jim Bob    | Bob         |
| John Bobbs | BobBobbs    |
| Tim Bub    | BobBobbsBub |
+------------+-------------+
3 rows in set (0.03 sec)

mysql> show variables like "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 8.0.11                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| tls_version             | TLSv1,TLSv1.1,TLSv1.2        |
| version                 | 8.0.11                       |
| version_comment         | MySQL Community Server - GPL |
| version_compile_machine | x86_64                       |
| version_compile_os      | linux-glibc2.12              |
| version_compile_zlib    | 1.2.11                       |
+-------------------------+------------------------------+
9 rows in set (0.01 sec)

-- 8.0.13 - not affected
mysql> create table tmp(wat varchar(100));
Query OK, 0 rows affected (0.05 sec)

mysql> insert into tmp(wat) values ('Jim Bob'), ('John Bobbs'), ('Tim Bub');
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select wat, regexp_replace(wat, '[^ ]* ', '') as lastName from tmp;
+------------+----------+
| wat        | lastName |
+------------+----------+
| Jim Bob    | Bob      |
| John Bobbs | Bobbs    |
| Tim Bub    | Bub      |
+------------+----------+
3 rows in set (0.00 sec)

mysql> show variables like "%version%";
+-------------------------+------------------------------+
| Variable_name           | Value                        |
+-------------------------+------------------------------+
| innodb_version          | 8.0.13                       |
| protocol_version        | 10                           |
| slave_type_conversions  |                              |
| tls_version             | TLSv1,TLSv1.1,TLSv1.2        |
| version                 | 8.0.13                       |
| version_comment         | MySQL Community Server - GPL |
| version_compile_machine | x86_64                       |
| version_compile_os      | linux-glibc2.12              |
| version_compile_zlib    | 1.2.11                       |
+-------------------------+------------------------------+
9 rows in set (0.01 sec)

regards,
Umesh
[5 Dec 2018 20:42] Daniel Posch
Thanks!

Curious, which bug is this a duplicate of? I searched for REGEXP_REPLACE before filing, got no results.
[5 Dec 2018 20:46] Daniel Posch
Never mind, I see https://bugs.mysql.com/bug.php?id=90803 now. Bug search is still a bit broken (searching for REGEXP_REPLACE should've returned that bug) but no problem.