MySQL Bugs: #100479: search with in() function runs really slow

Bug #100479	search with in() function runs really slow
Submitted:	10 Aug 2020 9:04	Modified:	25 Jul 2024 14:57
Reporter:	SUMMER SHENG	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S5 (Performance)
Version:	8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
search with row operants in in() function runs really slow,

much slower than an equivilant or-of-ands query.

How to repeat:
1. create table  & data
create table t1 (a char(6),b char(6),primary key(a,b));

delimiter //
create procedure insert_to_many_value(IN n int) 
begin
  declare t int default 1;
  while t<=n
  do
    insert into many_value values(100000+t,100000+t);
    set t = t+1;
  end while;
end;//
delimiter ;

call insert_to_many_value(100000); /* This insert 100k data. */

2.execute the query

select * 
from many_value
where (a,b) in (('100100','100100'),('100200','100200'),...,('199900','199900'));

There is about 1000 tuples in the in function's right hand side.
With the primary key, I expect a really fast excution.
However, this query took around 10 seconds.

When I rewrite this query to 

select * from many_value
where (a,b)=('100100','100100') or
      (a,b)=('100200','100200') or
      ...
      (a,b)=('199900','199900');

which gives the same result, the query runs much faster, in less than 1 second.

Hi Mr. SHENG,

Thank you for your bug report.

However, we require additional info in order to proceed.

First of all, we require full  query texts, not just three dots ......

Second, have you tried your query with 8.0.21, which contains many improvements.

Third and most important, this could be the effect of filling up buffer pool or AHI.

Hence, do you notice any difference in the speed for the first query when you run it three times.

Please, do answer all of the above questions. Without those, we can not proceed.

Hi!

I have tried the query on 8.0.21 for a few times, and the problem is still there. In addition, just explain the query takes almost the same time as to execute it.

The query is quite long, please try the python script below. Besides, the original post has some typo, try the new query.

Thank you!

1. generate table and data

1. create table & data
create table many_value (a char(12), b char(12), primary key(a,b));
delimiter //
create procedure insert_to_many_value(IN n int) 
begin
  declare t int default 1;
  while t<=n
  do
	insert into many_value values(123456789012+t,123456789012+t);
    set t = t+1;
  end while;
end;//
delimiter ;
call insert_to_many_value(100000);

2. the python script generate the slow query.

print 'select * from many_value where (a,b) in ('
n = 1000;
for i in range(n):
  print '(\'%d\',\'%d\'),'%(123456789012+10*i,123456789012+10*i)
print '(\'%d\',\'%d\'));'%(123456789012+10*n,123456789012+10*n)

3. the python script gengerate the equivilant faster query.

print 'select * from many_value where'
n = 1000;
for i in range(n):
  print '(a,b)=(\'%d\',\'%d\') or'%(123456789012+10*i,123456789012+10*i)
print '(a,b)=(\'%d\',\'%d\');'%(123456789012+10*n,123456789012+10*n)

By modifying n in the script, setting it to 1000,2000,4000, I observe the running time increasing in a rate at least quadratic to n.

data generating and queries

Attachment: data and queries.txt (text/plain), 74.78 KiB.

Hi Mr. SHENG,

Thank you for your feedback.

We have ran your queries and got results of 14 seconds for the first query , versus 2 seconds for the second one.

Verified as reported.

For the same queries on 5.7.31,  I have got, at average from the three runs, 3 (three) seconds for the first query and 2.5 seconds for the second query.

Hence, this is 8.0-only bug !!!!!!

In MySQL 8.0.21 the IN ( (...),(...) ) row comparator is doing string comparisons (probably require copying data in memory) whereas the fast query is doing direct field comparisons (which is using pointers to fields and not copying data).  Here are perf traces of ten iterations of each query.

-- With ROW constructor IN
Overhead  Command  Shared Object        Symbol
  72.01%  mysqld   mysqld               [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4>
  10.79%  mysqld   mysqld               [.] Field_string::cmp
   6.74%  mysqld   mysqld               [.] my_charpos_mb4
   3.93%  mysqld   mysqld               [.] sel_cmp
   1.56%  mysqld   mysqld               [.] my_lengthsp_8bit
   1.54%  mysqld   mysqld               [.] SEL_ROOT::find_range
   1.01%  mysqld   mysqld               [.] SEL_ROOT::insert
   0.82%  mysqld   mysqld               [.] my_strnncoll_uca_900
   0.43%  mysqld   mysqld               [.] SEL_ARG::rb_insert
   0.39%  mysqld   mysqld               [.] Field::key_cmp
   0.25%  mysqld   mysqld               [.] my_strnncollsp_uca_900
   0.17%  mysqld   mysqld               [.] key_or
   0.03%  mysqld   mysqld               [.] MYSQLparse
   0.03%  mysqld   [unknown]            [k] 0xffffffff83c00b27

-- with AND conditions
  46.33%  mysqld   mysqld               [.] my_strnncoll_uca<uca_scanner_900<Mb_wc_utf8mb4, 1>, 1, Mb_wc_utf8mb4>
  12.07%  mysqld   mysqld               [.] Arg_comparator::compare_string
   7.48%  mysqld   mysqld               [.] Item_cond_and::val_int
   5.84%  mysqld   mysqld               [.] Item_field::val_str
   3.69%  mysqld   mysqld               [.] Item_cond_or::val_int
   3.43%  mysqld   mysqld               [.] Item::val_bool
   2.41%  mysqld   mysqld               [.] my_lengthsp_8bit
   2.01%  mysqld   mysqld               [.] Field_string::val_str
   1.49%  mysqld   mysqld               [.] Item_func_eq::val_int
   0.96%  mysqld   mysqld               [.] MYSQLparse
   0.55%  mysqld   mysqld               [.] my_strnncoll_uca_900
   0.53%  mysqld   mysqld               [.] Item_func::update_used_tables
   0.51%  mysqld   mysqld               [.] Field_string::cmp
   0.47%  mysqld   mysqld               [.] sortcmp

Because utf8mb4 on char columns is a lot more data for a char(12) column (44 bytes vs 12 bytes) changing the character set of a and b to latin1 results in a dramatic speed improvement from 1.35 seconds to 0.19 sec.

Changing the character set does helps.

However, with increasing tuples in the in() function, the running time increase a lot faster than using the or-of-ands query.

For example, after converting the character set to latin1, I build a query with 10,000 tuples in the in() function's rhs and the equivilant or-of-ands form.

The executing time are 44s and 0.4s, respectively.

You can try this using the python script I provided in the last post, by changing n=1000 to n=10000.

Yes - for the row comparator it is copying the strings.  The longer you make the in clause, the more strings that need to be copied, and so it is at least O(N*N) since you are comparing two fields IN clause has two strings which must be copied.  If these strings represent your actual data, switch to the BIGINT data type instead of CHAR(12) and things will be a lot better because you are storing numbers in strings.

Well, if the row comparison for integers also copies the integers, then it will still get slower as you grow the IN list as each comparison will have to copy 16 bytes of data.  However, integer comparisons are much faster than string comparisons, so it will still be faster than using strings.  A string comparison on equal strings is O(N) the length of the string.  So using the row comparison with strings is O(N^4) because you have to copy the strings and compare them.

still affect 5.7 versions

HI Mr. Chen,

Version 5.7 is obsolete and it is  not maintained for more then a year.

Only versions 8.0 and higher are relevant.

https://bugs.mysql.com/bug.php?id=108963
Documented fix as follows in the MySQL 5.7.43, 8.0.34, and 8.1.0 changelogs:

Thank you, Mr. Chen,

For finding the original bug of this one ......

It has to be checked, but that would make this bug a duplicate bug.

Duplicate of BUG#108963 which was fixed in MySQL 5.7.43, 8.0.34, and 8.1.0.

Closed.

Thank you, Jon.

That means that users should just upgrade to the latest stable release, which is 8.0.39.