Bug #100479 search with in() function runs really slow
Submitted: 10 Aug 2020 9:04 Modified: 25 Jul 2024 14:57
Reporter: SUMMER SHENG Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[10 Aug 2020 9:04] SUMMER SHENG
Description:
search with row operants in in() function runs really slow,

much slower than an equivilant or-of-ands query.

How to repeat:
1. create table  & data
create table t1 (a char(6),b char(6),primary key(a,b));

delimiter //
create procedure insert_to_many_value(IN n int) 
begin
  declare t int default 1;
  while t<=n
  do
    insert into many_value values(100000+t,100000+t);
    set t = t+1;
  end while;
end;//
delimiter ;

call insert_to_many_value(100000); /* This insert 100k data. */

2.execute the query

select * 
from many_value
where (a,b) in (('100100','100100'),('100200','100200'),...,('199900','199900'));

There is about 1000 tuples in the in function's right hand side.
With the primary key, I expect a really fast excution.
However, this query took around 10 seconds.

When I rewrite this query to 

select * from many_value
where (a,b)=('100100','100100') or
      (a,b)=('100200','100200') or
      ...
      (a,b)=('199900','199900');

which gives the same result, the query runs much faster, in less than 1 second.
[10 Aug 2020 12:37] MySQL Verification Team
Hi Mr. SHENG,

Thank you for your bug report.

However, we require additional info in order to proceed.

First of all, we require full  query texts, not just three dots ......

Second, have you tried your query with 8.0.21, which contains many improvements.

Third and most important, this could be the effect of filling up buffer pool or AHI.

Hence, do you notice any difference in the speed for the first query when you run it three times.

Please, do answer all of the above questions. Without those, we can not proceed.
[11 Aug 2020 2:52] SUMMER SHENG
Hi!

I have tried the query on 8.0.21 for a few times, and the problem is still there. In addition, just explain the query takes almost the same time as to execute it.

The query is quite long, please try the python script below. Besides, the original post has some typo, try the new query.

Thank you!

1. generate table and data

1. create table & data
create table many_value (a char(12), b char(12), primary key(a,b));
delimiter //
create procedure insert_to_many_value(IN n int) 
begin
  declare t int default 1;
  while t<=n
  do
	insert into many_value values(123456789012+t,123456789012+t);
    set t = t+1;
  end while;
end;//
delimiter ;
call insert_to_many_value(100000);

2. the python script generate the slow query.

print 'select * from many_value where (a,b) in ('
n = 1000;
for i in range(n):
  print '(\'%d\',\'%d\'),'%(123456789012+10*i,123456789012+10*i)
print '(\'%d\',\'%d\'));'%(123456789012+10*n,123456789012+10*n)

3. the python script gengerate the equivilant faster query.

print 'select * from many_value where'
n = 1000;
for i in range(n):
  print '(a,b)=(\'%d\',\'%d\') or'%(123456789012+10*i,123456789012+10*i)
print '(a,b)=(\'%d\',\'%d\');'%(123456789012+10*n,123456789012+10*n)
[11 Aug 2020 3:15] SUMMER SHENG
By modifying n in the script, setting it to 1000,2000,4000, I observe the running time increasing in a rate at least quadratic to n.
[11 Aug 2020 12:12] SUMMER SHENG
data generating and queries

Attachment: data and queries.txt (text/plain), 74.78 KiB.

[11 Aug 2020 12:45] MySQL Verification Team
Hi Mr. SHENG,

Thank you for your feedback.

We have ran your queries and got results of 14 seconds for the first query , versus 2 seconds for the second one.

Verified as reported.
[11 Aug 2020 12:48] MySQL Verification Team
For the same queries on 5.7.31,  I have got, at average from the three runs, 3 (three) seconds for the first query and 2.5 seconds for the second query.

Hence, this is 8.0-only bug !!!!!!
[12 Aug 2020 16:01] Justin Swanhart
In MySQL 8.0.21 the IN ( (...),(...) ) row comparator is doing string comparisons (probably require copying data in memory) whereas the fast query is doing direct field comparisons (which is using pointers to fields and not copying data).  Here are perf traces of ten iterations of each query.

-- With ROW constructor IN
Overhead  Command  Shared Object        Symbol
  72.01%  mysqld   mysqld               [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4>
  10.79%  mysqld   mysqld               [.] Field_string::cmp
   6.74%  mysqld   mysqld               [.] my_charpos_mb4
   3.93%  mysqld   mysqld               [.] sel_cmp
   1.56%  mysqld   mysqld               [.] my_lengthsp_8bit
   1.54%  mysqld   mysqld               [.] SEL_ROOT::find_range
   1.01%  mysqld   mysqld               [.] SEL_ROOT::insert
   0.82%  mysqld   mysqld               [.] my_strnncoll_uca_900
   0.43%  mysqld   mysqld               [.] SEL_ARG::rb_insert
   0.39%  mysqld   mysqld               [.] Field::key_cmp
   0.25%  mysqld   mysqld               [.] my_strnncollsp_uca_900
   0.17%  mysqld   mysqld               [.] key_or
   0.03%  mysqld   mysqld               [.] MYSQLparse
   0.03%  mysqld   [unknown]            [k] 0xffffffff83c00b27

-- with AND conditions
  46.33%  mysqld   mysqld               [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4>
  12.07%  mysqld   mysqld               [.] Arg_comparator::compare_string
   7.48%  mysqld   mysqld               [.] Item_cond_and::val_int
   5.84%  mysqld   mysqld               [.] Item_field::val_str
   3.69%  mysqld   mysqld               [.] Item_cond_or::val_int
   3.43%  mysqld   mysqld               [.] Item::val_bool
   2.41%  mysqld   mysqld               [.] my_lengthsp_8bit
   2.01%  mysqld   mysqld               [.] Field_string::val_str
   1.49%  mysqld   mysqld               [.] Item_func_eq::val_int
   0.96%  mysqld   mysqld               [.] MYSQLparse
   0.55%  mysqld   mysqld               [.] my_strnncoll_uca_900
   0.53%  mysqld   mysqld               [.] Item_func::update_used_tables
   0.51%  mysqld   mysqld               [.] Field_string::cmp
   0.47%  mysqld   mysqld               [.] sortcmp
[12 Aug 2020 16:16] Justin Swanhart
Because utf8mb4 on char columns is a lot more data for a char(12) column (44 bytes vs 12 bytes) changing the character set of a and b to latin1 results in a dramatic speed improvement from 1.35 seconds to 0.19 sec.
[13 Aug 2020 1:45] SUMMER SHENG
Changing the character set does helps.

However, with increasing tuples in the in() function, the running time increase a lot faster than using the or-of-ands query.

For example, after converting the character set to latin1, I build a query with 10,000 tuples in the in() function's rhs and the equivilant or-of-ands form.

The executing time are 44s and 0.4s, respectively.

You can try this using the python script I provided in the last post, by changing n=1000 to n=10000.
[15 Aug 2020 12:20] Justin Swanhart
Yes - for the row comparator it is copying the strings.  The longer you make the in clause, the more strings that need to be copied, and so it is at least O(N*N) since you are comparing two fields IN clause has two strings which must be copied.  If these strings represent your actual data, switch to the BIGINT data type instead of CHAR(12) and things will be a lot better because you are storing numbers in strings.
[15 Aug 2020 12:37] Justin Swanhart
Well, if the row comparison for integers also copies the integers, then it will still get slower as you grow the IN list as each comparison will have to copy 16 bytes of data.  However, integer comparisons are much faster than string comparisons, so it will still be faster than using strings.  A string comparison on equal strings is O(N) the length of the string.  So using the row comparison with strings is O(N^4) because you have to copy the strings and compare them.
[22 Jul 2024 8:34] Demon Chen
still affect 5.7 versions
[22 Jul 2024 9:39] MySQL Verification Team
HI Mr. Chen,

Version 5.7 is obsolete and it is  not maintained for more then a year.

Only versions 8.0 and higher are relevant.
[24 Jul 2024 4:16] Demon Chen
https://bugs.mysql.com/bug.php?id=108963
Documented fix as follows in the MySQL 5.7.43, 8.0.34, and 8.1.0 changelogs:
[24 Jul 2024 8:50] MySQL Verification Team
Thank you, Mr. Chen,

For finding the original bug of this one ......

It has to be checked, but that would make this bug a duplicate bug.
[25 Jul 2024 14:57] Jon Stephens
Duplicate of BUG#108963 which was fixed in MySQL 5.7.43, 8.0.34, and 8.1.0.

Closed.
[25 Jul 2024 15:10] MySQL Verification Team
Thank you, Jon.

That means that users should just upgrade to the latest stable release, which is 8.0.39.