Description:
The performance of REGEXP within a User Defined Function degrades the more times it is run in a single query. Instead of linear performance (ex: 100,000 in 1.0 seconds, 1,000,000 in 10.0 seconds), the performance is closer to exponential. The problem is new to 5.5, since I can run the same simple tests in 5.1 and the performance is predictable.
How to repeat:
DELIMITER $$
DROP FUNCTION IF EXISTS test.slow_like $$
CREATE FUNCTION test.slow_like (arg varchar(255)) RETURNS INT
NO SQL
DETERMINISTIC
BEGIN
DECLARE ret INT;
SELECT arg LIKE '1%' INTO ret;
RETURN ret;
END $$
DELIMITER ;
With that procedure, start running benchmarks with increasing numbers of iterations until you hit the issue.
SELECT BENCHMARK(100, test.slow_like('1234'));
SELECT BENCHMARK(1000, test.slow_like('1234'));
SELECT BENCHMARK(10000, test.slow_like('1234'));
...
However, once I trigger the issue, it seems to cause lingering slowness. So, even shorter benchmarks start going slower. Same slowdown occurs with (SELECT arg REGEXP '^1' INTO ret), so it's not just LIKE. However, if I just (SELECT 1 INTO ret), the function performance is linear. So it's not UDF performance in general that's degrading on my server.
I can run the same benchmark on the same UDF on my 5.1.51 server and the time-per-iteration is just about constant, regardless of how many UDF calls occur in the same query.
--
For comparison, and to verify that the issue is isolated to UDFs, run the select from the UDF directly:
SELECT BENCHMARK(100, (SELECT '1234' LIKE '1%'));
SELECT BENCHMARK(1000, (SELECT '1234' LIKE '1%'));
SELECT BENCHMARK(10000, (SELECT '1234' LIKE '1%'));
On my 5.5.8 server, there is never a pathological case with those select benchmark. I get over 1-million loops per second, regardless of how many loops I put in.
Suggested fix:
I would like NO SQL DETERMINISTIC user-defined functions with LIKE/REGEXP expressions to run in relatively constant time again, like in 5.1.x.
Description: The performance of REGEXP within a User Defined Function degrades the more times it is run in a single query. Instead of linear performance (ex: 100,000 in 1.0 seconds, 1,000,000 in 10.0 seconds), the performance is closer to exponential. The problem is new to 5.5, since I can run the same simple tests in 5.1 and the performance is predictable. How to repeat: DELIMITER $$ DROP FUNCTION IF EXISTS test.slow_like $$ CREATE FUNCTION test.slow_like (arg varchar(255)) RETURNS INT NO SQL DETERMINISTIC BEGIN DECLARE ret INT; SELECT arg LIKE '1%' INTO ret; RETURN ret; END $$ DELIMITER ; With that procedure, start running benchmarks with increasing numbers of iterations until you hit the issue. SELECT BENCHMARK(100, test.slow_like('1234')); SELECT BENCHMARK(1000, test.slow_like('1234')); SELECT BENCHMARK(10000, test.slow_like('1234')); ... However, once I trigger the issue, it seems to cause lingering slowness. So, even shorter benchmarks start going slower. Same slowdown occurs with (SELECT arg REGEXP '^1' INTO ret), so it's not just LIKE. However, if I just (SELECT 1 INTO ret), the function performance is linear. So it's not UDF performance in general that's degrading on my server. I can run the same benchmark on the same UDF on my 5.1.51 server and the time-per-iteration is just about constant, regardless of how many UDF calls occur in the same query. -- For comparison, and to verify that the issue is isolated to UDFs, run the select from the UDF directly: SELECT BENCHMARK(100, (SELECT '1234' LIKE '1%')); SELECT BENCHMARK(1000, (SELECT '1234' LIKE '1%')); SELECT BENCHMARK(10000, (SELECT '1234' LIKE '1%')); On my 5.5.8 server, there is never a pathological case with those select benchmark. I get over 1-million loops per second, regardless of how many loops I put in. Suggested fix: I would like NO SQL DETERMINISTIC user-defined functions with LIKE/REGEXP expressions to run in relatively constant time again, like in 5.1.x.