Bug #60983 LIKE/REGEXP performance regression in stored functions
Submitted: 26 Apr 2011 23:00 Modified: 27 Apr 2011 17:40
Reporter: Ashley Winters Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Stored Routines Severity:S5 (Performance)
Version:5.5.8-enterprise-commercial-advanced-log OS:Linux
Assigned to: CPU Architecture:Any
Tags: BENCHMARK, FUNCTION, like, performance, regex, REGEXP, regression, udf

[26 Apr 2011 23:00] Ashley Winters
Description:
The performance of REGEXP within a User Defined Function degrades the more times it is run in a single query. Instead of linear performance (ex: 100,000 in 1.0 seconds, 1,000,000 in 10.0 seconds), the performance is closer to exponential. The problem is new to 5.5, since I can run the same simple tests in 5.1 and the performance is predictable.

How to repeat:
DELIMITER $$

DROP FUNCTION IF EXISTS test.slow_like $$
CREATE FUNCTION test.slow_like (arg varchar(255)) RETURNS INT
  NO SQL
  DETERMINISTIC
BEGIN
  DECLARE ret INT;
  SELECT arg LIKE '1%' INTO ret;
  RETURN ret;
END $$

DELIMITER ;

With that procedure, start running benchmarks with increasing numbers of iterations until you hit the issue.

SELECT BENCHMARK(100, test.slow_like('1234'));
SELECT BENCHMARK(1000, test.slow_like('1234'));
SELECT BENCHMARK(10000, test.slow_like('1234'));
...

However, once I trigger the issue, it seems to cause lingering slowness. So, even shorter benchmarks start going slower. Same slowdown occurs with (SELECT arg REGEXP '^1' INTO ret), so it's not just LIKE. However, if I just (SELECT 1 INTO ret), the function performance is linear. So it's not UDF performance in general that's degrading on my server.

I can run the same benchmark on the same UDF on my 5.1.51 server and the time-per-iteration is just about constant, regardless of how many UDF calls occur in the same query.

--

For comparison, and to verify that the issue is isolated to UDFs, run the select from the UDF directly:

SELECT BENCHMARK(100, (SELECT '1234' LIKE '1%'));
SELECT BENCHMARK(1000, (SELECT '1234' LIKE '1%'));
SELECT BENCHMARK(10000, (SELECT '1234' LIKE '1%'));

On my 5.5.8 server, there is never a pathological case with those select benchmark. I get over 1-million loops per second, regardless of how many loops I put in.

Suggested fix:
I would like NO SQL DETERMINISTIC user-defined functions with LIKE/REGEXP expressions to run in relatively constant time again, like in 5.1.x.
[26 Apr 2011 23:07] Ashley Winters
Mention LIKE in the synposis/tags
[27 Apr 2011 3:09] Valeriy Kravchuk
Looks like a duplicate of/closely related to Bug #60025. Please, check.
[27 Apr 2011 17:38] Ashley Winters
Yes, I'd agree it's a dupe. I actually found Bug #60687 (another dupe) to be closer in description, but the fix for me is the same. set names latin1 resolves my problem on 5.5.
[27 Apr 2011 17:40] Valeriy Kravchuk
Duplicate of Bug #60025.