Bug #65243 | Substring function very slow for longer strings | ||
---|---|---|---|
Submitted: | 8 May 2012 17:50 | Modified: | 8 Jan 2020 22:51 |
Reporter: | Franjo Markovic | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: DML | Severity: | S5 (Performance) |
Version: | 5.6.4 | OS: | Windows |
Assigned to: | CPU Architecture: | Any |
[8 May 2012 17:50]
Franjo Markovic
[8 May 2012 19:04]
MySQL Verification Team
Please try 5.6.5-m8 and print here your result. Thank you.
[8 May 2012 19:14]
Franjo Markovic
It's the same with 5.6.5 - about 6.6 seconds on windows pc.
[9 May 2012 3:49]
MySQL Verification Team
Here's the testcase I used: select version(); set names latin1; # note! set @str1 = repeat("a",1000000); select benchmark(25000, substring(@str1,50000,1)); select benchmark(25000, substring(@str1,50000,1)); select benchmark(25000, substring(@str1,50000,1)); Results for latin1: ------------------------------------- 5.5.24: 2.00 sec, 2.00 sec, 2.02 sec 5.1.62: 1.98 sec, 1.98 sec, 2.00 sec 5.6.5: 2.00 sec, 2.00 sec, 1.98 sec 5.0.96: 1.97 sec, 1.97 sec, 1.97 sec Results for utf8 (set names utf8 in testcase): ---------------------------------------------- 5.5.24: 11.03 sec, 11.02 sec, 11.00 sec 5.1.62: 9.75 sec, 9.70 sec, 9.69 sec 5.6.5: 10.00 sec, 9.83 sec, 9.83 sec 5.0.96: 9.86 sec, 9.86 sec, 10.08 sec There is no regression here. The 5.5 and 5.6 client set charset automatically depending on your OS/environment. This is probably why you have worse results than 5.1 who uses latin1. Please retest by explicitly setting charset in all versions..
[9 May 2012 14:17]
Franjo Markovic
Are those times in miliseconds? The don't seem to make much sense to me (but I don't have access to source code). Can you please compare the same times when running very similar query that takes 0.01 seconds (instead of 6 seconds): select benchmark(25000, substring(@str1,2,1));
[10 May 2012 9:00]
MySQL Verification Team
Here's the new testcase I used: select version(); set names utf8; # note! set @str1 = repeat("a",1000000); select benchmark(25000, substring(@str1,2,1)); select benchmark(25000, substring(@str1,2,1)); select benchmark(25000, substring(@str1,2,1)); Results for latin1: ------------------------------------- 5.5.24: 2.02 sec, 2.05 sec, 2.00 sec 5.1.62: 2.00 sec, 1.99 sec, 1.97 sec 5.6.5: 2.00 sec, 2.00 sec, 1.98 sec 5.0.96: 2.09 sec, 2.09 sec, 2.09 sec Results for utf8 (set names utf8 in testcase): ---------------------------------------------- 5.5.24: 2.00 sec, 2.00 sec, 2.00 sec 5.1.62: 1.99 sec, 1.99 sec, 1.99 sec 5.6.5: 2.02 sec, 2.02 sec, 1.98 sec 5.0.96: 1.98 sec, 1.99 sec, 1.98 sec It looks like this report is simply a feature request to improve the performance of utf8 string processing. For long strings, latin1 does a straight memcpy, where as utf8 must change each char one at a time.
[10 May 2012 14:51]
Franjo Markovic
We are getting closer. Those 2 seconds you reported is probably just the time needed to create @str1 value. Yes, the issue is about utf8 only. Why does it have to go one char at a time? It's like a doing full table scan each time to retrieve one record (letter). It seems to me that the storage of utf8 characters in memory is inadequate. Most systems store unicode characters as fixed 3 bytes in memory, to avoid such counting. Does mysql not do it?
[8 Jan 2020 22:50]
Roy Lyseng
Posted by developer: The performance problem is not reproducible in MySQL 8.0.18. The below table shows performance for latin1, utf8mb3 (utf) and utfmb4 in 5.7 and 8.0. As seen, performance for utf8mb4 is almost as good as for latin1. utf8mb4 is the recommended replacement for utf8mb3 in MySQL 8.0. Results for latin1 (100000 iterations) ------------------------------------- 5.7.29 6.41, 6.39, 6.37 8.0.18 6.31, 6.31, 6.30 Results for utf8mb3 (100000 iterations) --------------------------------------- 5.7.29 18.10, 18.04, 18.04 8.0.18 18.00, 18.00, 17.98 Results for utf8mb4 (100000 iterations) --------------------------------------- 5.7.29 23.93, 23.93, 23.92 8.0.18 6.56, 6.56, 6.55