Bug #107554 regexp_replace in trigger VERY slow when using unicode category \\p{C}
Submitted: 13 Jun 2022 20:28 Modified: 14 Jun 2022 9:58
Reporter: Patrick LeGros Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0.29-0ubuntu0.20.04.3 OS:Ubuntu (20.04.4 LTS)
Assigned to: CPU Architecture:x86
Tags: regexp regexp_replace sql \\p{C} Control Characetr

[13 Jun 2022 20:28] Patrick LeGros
Description:
When I put the following SET in an insert trigger and insert 1000 rows it becomes ~8seconds slower.

SET NEW.name = REGEXP_REPLACE(' ', '\\p{C}', '');

However when I run a BENCHMARK of 1000 on the same expression, it only takes ~0.01seconds.

SELECT BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{C}', ''));

How to repeat:
---- control & experiment table set up -----
DELIMITER ;;

DROP TABLE IF EXISTS control;;
CREATE TABLE control (
    id SERIAL,
    name VARCHAR(64) NOT NULL DEFAULT ' ');;

DROP TRIGGER IF EXISTS control_insert_before;;
CREATE TRIGGER control_insert_before BEFORE INSERT ON control
FOR EACH ROW BEGIN
    SET NEW.name = REGEXP_REPLACE(' ', '\\p{S}', '');
    -- SAME Results w/ NEW.name, but wanted to make similar to BENCHMARK
    -- SET NEW.name = REGEXP_REPLACE(NEW.name, '\\p{S}', '');
END;;

DROP TABLE IF EXISTS experiment;;
CREATE TABLE experiment (
    id SERIAL,
    name VARCHAR(64) NOT NULL DEFAULT ' ');;

DROP TRIGGER IF EXISTS experiment_insert_before;;
CREATE TRIGGER experiment_insert_before BEFORE INSERT ON experiment
FOR EACH ROW BEGIN
    SET NEW.name = REGEXP_REPLACE(' ', '\\p{C}', '');
    -- SAME Results w/ NEW.name, but wanted to make similar to BENCHMARK
    -- SET NEW.name = REGEXP_REPLACE(NEW.name, '\\p{C}', '');
END;;

---- tests to run ----
INSERT INTO control VALUES (),(),,... 1000x
INSERT INTO experiment VALUES (),(),... 1000x

SELECT BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{S}', ''));
SELECT BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{C}', ''));

Suggested fix:
I don't know.
[13 Jun 2022 21:59] Patrick LeGros
Also tried the following test, inserted 1000 and it completed in ~0.04sec.
The REGEXP_REPLACE with \p{C} seems to only be problematic in the trigger.

DROP TABLE IF EXISTS test;
CREATE TABLE test (
    id      SERIAL,
    name    VARCHAR(64) NOT NULL DEFAULT '1\02\03',
    namegen VARCHAR(64) AS (REGEXP_REPLACE(name, '\\p{C}', '')) STORED NOT NULL
);
[14 Jun 2022 9:33] MySQL Verification Team
Hi Patrick,

What version did you tested this with? Latest 8.0 or ?

thanks
[14 Jun 2022 9:46] Patrick LeGros
ran: mysql --version
output: mysql  Ver 8.0.29-0ubuntu0.20.04.3 for Linux on x86_64 ((Ubuntu))

(sorry, i think i cleaned up the version wrong and put 9)
[14 Jun 2022 9:58] MySQL Verification Team
Hi,

Interesting bug! Thank you for report and way to reproduce. Verified.

mysql [localhost:8027] {msandbox} (b107554) > insert into experiment select null,null from control;
Query OK, 8192 rows affected (1 min 16.37 sec)
Records: 8192  Duplicates: 0  Warnings: 0

mysql [localhost:8027] {msandbox} (b107554) > SELECT BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{S}', ''));
+----------------------------------------------------+
| BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{S}', '')) |
+----------------------------------------------------+
|                                                  0 |
+----------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8027] {msandbox} (b107554) > SELECT BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{C}', ''));
+----------------------------------------------------+
| BENCHMARK(1000, REGEXP_REPLACE(' ', '\\p{C}', '')) |
+----------------------------------------------------+
|                                                  0 |
+----------------------------------------------------+
1 row in set (0.01 sec)

mysql [localhost:8027] {msandbox} (b107554) > insert into control select null,null from control;
Query OK, 8192 rows affected (1.92 sec)
Records: 8192  Duplicates: 0  Warnings: 0

kind regards
Bogdan