Bug #100479 | search with in() function runs really slow | ||
---|---|---|---|
Submitted: | 10 Aug 2020 9:04 | Modified: | 25 Jul 2024 14:57 |
Reporter: | SUMMER SHENG | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S5 (Performance) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[10 Aug 2020 9:04]
SUMMER SHENG
[10 Aug 2020 12:37]
MySQL Verification Team
Hi Mr. SHENG, Thank you for your bug report. However, we require additional info in order to proceed. First of all, we require full query texts, not just three dots ...... Second, have you tried your query with 8.0.21, which contains many improvements. Third and most important, this could be the effect of filling up buffer pool or AHI. Hence, do you notice any difference in the speed for the first query when you run it three times. Please, do answer all of the above questions. Without those, we can not proceed.
[11 Aug 2020 2:52]
SUMMER SHENG
Hi! I have tried the query on 8.0.21 for a few times, and the problem is still there. In addition, just explain the query takes almost the same time as to execute it. The query is quite long, please try the python script below. Besides, the original post has some typo, try the new query. Thank you! 1. generate table and data 1. create table & data create table many_value (a char(12), b char(12), primary key(a,b)); delimiter // create procedure insert_to_many_value(IN n int) begin declare t int default 1; while t<=n do insert into many_value values(123456789012+t,123456789012+t); set t = t+1; end while; end;// delimiter ; call insert_to_many_value(100000); 2. the python script generate the slow query. print 'select * from many_value where (a,b) in (' n = 1000; for i in range(n): print '(\'%d\',\'%d\'),'%(123456789012+10*i,123456789012+10*i) print '(\'%d\',\'%d\'));'%(123456789012+10*n,123456789012+10*n) 3. the python script gengerate the equivilant faster query. print 'select * from many_value where' n = 1000; for i in range(n): print '(a,b)=(\'%d\',\'%d\') or'%(123456789012+10*i,123456789012+10*i) print '(a,b)=(\'%d\',\'%d\');'%(123456789012+10*n,123456789012+10*n)
[11 Aug 2020 3:15]
SUMMER SHENG
By modifying n in the script, setting it to 1000,2000,4000, I observe the running time increasing in a rate at least quadratic to n.
[11 Aug 2020 12:12]
SUMMER SHENG
data generating and queries
Attachment: data and queries.txt (text/plain), 74.78 KiB.
[11 Aug 2020 12:45]
MySQL Verification Team
Hi Mr. SHENG, Thank you for your feedback. We have ran your queries and got results of 14 seconds for the first query , versus 2 seconds for the second one. Verified as reported.
[11 Aug 2020 12:48]
MySQL Verification Team
For the same queries on 5.7.31, I have got, at average from the three runs, 3 (three) seconds for the first query and 2.5 seconds for the second query. Hence, this is 8.0-only bug !!!!!!
[12 Aug 2020 16:01]
Justin Swanhart
In MySQL 8.0.21 the IN ( (...),(...) ) row comparator is doing string comparisons (probably require copying data in memory) whereas the fast query is doing direct field comparisons (which is using pointers to fields and not copying data). Here are perf traces of ten iterations of each query. -- With ROW constructor IN Overhead Command Shared Object Symbol 72.01% mysqld mysqld [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4> 10.79% mysqld mysqld [.] Field_string::cmp 6.74% mysqld mysqld [.] my_charpos_mb4 3.93% mysqld mysqld [.] sel_cmp 1.56% mysqld mysqld [.] my_lengthsp_8bit 1.54% mysqld mysqld [.] SEL_ROOT::find_range 1.01% mysqld mysqld [.] SEL_ROOT::insert 0.82% mysqld mysqld [.] my_strnncoll_uca_900 0.43% mysqld mysqld [.] SEL_ARG::rb_insert 0.39% mysqld mysqld [.] Field::key_cmp 0.25% mysqld mysqld [.] my_strnncollsp_uca_900 0.17% mysqld mysqld [.] key_or 0.03% mysqld mysqld [.] MYSQLparse 0.03% mysqld [unknown] [k] 0xffffffff83c00b27 -- with AND conditions 46.33% mysqld mysqld [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4> 12.07% mysqld mysqld [.] Arg_comparator::compare_string 7.48% mysqld mysqld [.] Item_cond_and::val_int 5.84% mysqld mysqld [.] Item_field::val_str 3.69% mysqld mysqld [.] Item_cond_or::val_int 3.43% mysqld mysqld [.] Item::val_bool 2.41% mysqld mysqld [.] my_lengthsp_8bit 2.01% mysqld mysqld [.] Field_string::val_str 1.49% mysqld mysqld [.] Item_func_eq::val_int 0.96% mysqld mysqld [.] MYSQLparse 0.55% mysqld mysqld [.] my_strnncoll_uca_900 0.53% mysqld mysqld [.] Item_func::update_used_tables 0.51% mysqld mysqld [.] Field_string::cmp 0.47% mysqld mysqld [.] sortcmp
[12 Aug 2020 16:16]
Justin Swanhart
Because utf8mb4 on char columns is a lot more data for a char(12) column (44 bytes vs 12 bytes) changing the character set of a and b to latin1 results in a dramatic speed improvement from 1.35 seconds to 0.19 sec.
[13 Aug 2020 1:45]
SUMMER SHENG
Changing the character set does helps. However, with increasing tuples in the in() function, the running time increase a lot faster than using the or-of-ands query. For example, after converting the character set to latin1, I build a query with 10,000 tuples in the in() function's rhs and the equivilant or-of-ands form. The executing time are 44s and 0.4s, respectively. You can try this using the python script I provided in the last post, by changing n=1000 to n=10000.
[15 Aug 2020 12:20]
Justin Swanhart
Yes - for the row comparator it is copying the strings. The longer you make the in clause, the more strings that need to be copied, and so it is at least O(N*N) since you are comparing two fields IN clause has two strings which must be copied. If these strings represent your actual data, switch to the BIGINT data type instead of CHAR(12) and things will be a lot better because you are storing numbers in strings.
[15 Aug 2020 12:37]
Justin Swanhart
Well, if the row comparison for integers also copies the integers, then it will still get slower as you grow the IN list as each comparison will have to copy 16 bytes of data. However, integer comparisons are much faster than string comparisons, so it will still be faster than using strings. A string comparison on equal strings is O(N) the length of the string. So using the row comparison with strings is O(N^4) because you have to copy the strings and compare them.
[22 Jul 2024 8:34]
Demon Chen
still affect 5.7 versions
[22 Jul 2024 9:39]
MySQL Verification Team
HI Mr. Chen, Version 5.7 is obsolete and it is not maintained for more then a year. Only versions 8.0 and higher are relevant.
[24 Jul 2024 4:16]
Demon Chen
https://bugs.mysql.com/bug.php?id=108963 Documented fix as follows in the MySQL 5.7.43, 8.0.34, and 8.1.0 changelogs:
[24 Jul 2024 8:50]
MySQL Verification Team
Thank you, Mr. Chen, For finding the original bug of this one ...... It has to be checked, but that would make this bug a duplicate bug.
[25 Jul 2024 14:57]
Jon Stephens
Duplicate of BUG#108963 which was fixed in MySQL 5.7.43, 8.0.34, and 8.1.0. Closed.
[25 Jul 2024 15:10]
MySQL Verification Team
Thank you, Jon. That means that users should just upgrade to the latest stable release, which is 8.0.39.