Bug #101732 ibtmp1 is very large in semijoin duplicate weedout strategy
Submitted: 24 Nov 2020 9:55 Modified: 14 Dec 2020 11:00
Reporter: Dean Zhou Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7.30, 5.7.32, 8.0.21 OS:Any
Assigned to: CPU Architecture:Any
Tags: ibtmp1

[24 Nov 2020 9:55] Dean Zhou
Description:
We found that the SQL using semi join duplicateweedout strategy produce a very large temporary table space(ibtmp1), which does not meet our expectations. For example, a join of two 2GB tables will generate 19GB ibtmp1 space.

How to repeat:
1.  prepare 
   a) sysbench initializes 2 tables with 10 million rows
   b) alter table sbtest2 add a int default null;
   c) update sbtest1, sbtest2 set sbtest2.a= sbtest1.k where sbtest1.id = sbtest2.id;

3. execute sql
 set optimizer_switch  = 'firstmatch=off';
 set optimizer_switch  = 'materialization=off';            
 set max_heap_table_size = 16384;
 select count(*) from sbtest1 where k in (select a from sbtest2  );   //  You'll see that ibtmp1 keeps growing, and after SQL execution, ibtmp1 is about 19GB, execution time is about 1hour44min. 

Suggested fix:
the temp table insert stack as follows:

#0  row_insert_for_mysql_using_cursor (mysql_rec=0x7faa4001e9a0 "\377\004\211xf", prebuilt=0x7faa40942358) at /data/mysql-5.7.30/storage/innobase/row/row0mysql.cc:1561
#1  0x0000000000fd784e in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7faa4001e9a0 "\377\004\211xf", prebuilt=<optimized out>) at /data/mysql-5.7.30/storage/innobase/row/row0mysql.cc:1864
#2  0x0000000000ef9419 in intrinsic_table_write_row (record=0x7faa4001e9a0 "\377\004\211xf", this=0x7faa40939248) at /data/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:7441
#3  ha_innobase::write_row (this=0x7faa40939248, record=0x7faa4001e9a0 "\377\004\211xf") at /data/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:7465
#4  0x00000000007eb947 in handler::ha_write_row (this=0x7faa40939248, buf=0x7faa4001e9a0 "\377\004\211xf") at /data/mysql-5.7.30/sql/handler.cc:8093
#5  0x0000000000c5785c in do_sj_dups_weedout (thd=0x7faa40000ae0, sjtbl=0x7faa40937a70) at /data/mysql-5.7.30/sql/sql_executor.cc:1432
#6  0x0000000000c57ada in evaluate_join_record (join=join@entry=0x7faa400105b0, qep_tab=qep_tab@entry=0x7faa409378f8) at /data/mysql-5.7.30/sql/sql_executor.cc:1602
#7  0x0000000000c5d08c in sub_select (join=0x7faa400105b0, qep_tab=0x7faa409378f8, end_of_records=<optimized out>) at /data/mysql-5.7.30/sql/sql_executor.cc:1304
#8  0x0000000000c57b05 in evaluate_join_record (join=join@entry=0x7faa400105b0, qep_tab=qep_tab@entry=0x7faa40937780) at /data/mysql-5.7.30/sql/sql_executor.cc:1652
#9  0x0000000000c5d08c in sub_select (join=0x7faa400105b0, qep_tab=0x7faa40937780, end_of_records=<optimized out>) at /data/mysql-5.7.30/sql/sql_executor.cc:1304
#10 0x0000000000c55e97 in do_select (join=0x7faa400105b0) at /data/mysql-5.7.30/sql/sql_executor.cc:957
#11 JOIN::exec (this=0x7faa400105b0) at /data/mysql-5.7.30/sql/sql_executor.cc:206
#12 0x0000000000cc335d in handle_query (thd=thd@entry=0x7faa40000ae0, lex=lex@entry=0x7faa40002c50, result=result@entry=0x7faa400075c0, added_options=added_options@entry=0, 
    removed_options=removed_options@entry=0) at /data/mysql-5.7.30/sql/sql_select.cc:191
#13 0x00000000007551e4 in execute_sqlcom_select (thd=thd@entry=0x7faa40000ae0, all_tables=<optimized out>) at /data/mysql-5.7.30/sql/sql_parse.cc:5155
#14 0x0000000000c886f9 in mysql_execute_command (thd=thd@entry=0x7faa40000ae0, first_level=first_level@entry=true) at /data/mysql-5.7.30/sql/sql_parse.cc:2826
#15 0x0000000000c8ad4d in mysql_parse (thd=thd@entry=0x7faa40000ae0, parser_state=parser_state@entry=0x7faa740e7660) at /data/mysql-5.7.30/sql/sql_parse.cc:5584
#16 0x0000000000c8b8b0 in dispatch_command (thd=thd@entry=0x7faa40000ae0, com_data=com_data@entry=0x7faa740e7cc0, command=COM_QUERY) at /data/mysql-5.7.30/sql/sql_parse.cc:1491
#17 0x0000000000c8d2c7 in do_command (thd=thd@entry=0x7faa40000ae0) at /data/mysql-5.7.30/sql/sql_parse.cc:1032
#18 0x0000000000d4acc8 in handle_connection (arg=arg@entry=0x13f19eb0) at /data/mysql-5.7.30/sql/conn_handler/connection_handler_per_thread.cc:313
#19 0x0000000001219591 in pfs_spawn_thread (arg=0x13fa5cc0) at /data/mysql-5.7.30/storage/perfschema/pfs.cc:2197
#20 0x00007fad1f565dd5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fad1d94b02d in clone () from /lib64/libc.so.6

The duplicateweedout strategy uses temporary table deduplication. create_duplicate_weedout_tmp_table function will create a temp table,  the ddl is "CREATE TABLE tmp (col VARBINARY(n) NOT NULL, UNIQUE KEY(col))", 
Innodb will automatically generate GEN_CLUST_INDEX for temporary tables because of the tmp table no primary key. In row_insert_for_mysql_using_cursor, first, insert the primary key record, and then insert the secondary unique index record. If the secondary index duplicates, the primary key index record will be delete-marked. The delete-marked records in the temporary table will not be reused. If the secondary unique index conflict is very serious, the ibmtp1 space will be greatly expanded.

solution1:
create_duplicate_weedout_tmp_table {
  ...
	share->primary_key= MAX_KEY;   --> share->primary_key=  0 // add primary key, temporary tables can use the primary key to remove duplicates 
  ...
}

After the optimization, ibtmp1 space 19G --> 332M, execution time 1hour44min --> 44min

solution2:  
reuse the space of delete marked records in the temporary table when insert primary key record.

I am not very familiar with the code, there may be other optimization methods.
[24 Nov 2020 11:11] Dean Zhou
8.0.21 can also reproduce
[25 Nov 2020 12:22] MySQL Verification Team
Hello Dean Zhou,

Thank you for the report and feedback.
May I request you to please provide MySQL Server configuration details(my.cnf/my.ini - pls mark it as private after posting here), OS details and sysbench test used for preparing etc to reproduce this issue at our end. Thank you.

regards,
Umesh
[26 Nov 2020 7:29] Dean Zhou
Hi Umesh,
Thank you for the feed.

1. test preparation
   a) sysbench initializes 2 tables with 10 million rows

    sysbench 1.0.20

    sysbench --mysql-user='user' --mysql-password='password' --mysql-db='test'  --mysql-host=host --mysql-port=5730  --tables=2 --time=100 --report-interval=1  --table_size=10000000  -
    -threads=10  oltp_insert.lua cleanup
    sysbench --mysql-user='user' --mysql-password='password' --mysql-db='test'  --mysql-host=host --mysql-port=5730  --tables=2 --time=100 --report-interval=1  --table_size=10000000  -
    -threads=10   oltp_insert.lua prepare

   b) alter table sbtest2 add a int default null;
   c) update sbtest1, sbtest2 set sbtest2.a= sbtest1.k where sbtest1.id = sbtest2.id;

2. execute sql
MySQL [(none)]> use test
Database changed
MySQL [test]>  set optimizer_switch  = 'firstmatch=off';
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  set optimizer_switch  = 'materialization=off';            
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  set max_heap_table_size = 16384;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  explain  select count(*) from sbtest1 where k in (select a from sbtest2  );
+----+-------------+---------+------------+------+---------------+------+---------+----------------+---------+----------+------------------------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref            | rows    | filtered | Extra                        |
+----+-------------+---------+------------+------+---------------+------+---------+----------------+---------+----------+------------------------------+
|  1 | SIMPLE      | sbtest2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL           | 9857202 |   100.00 | Using where; Start temporary |
|  1 | SIMPLE      | sbtest1 | NULL       | ref  | k_1           | k_1  | 4       | test.sbtest2.a |       6 |   100.00 | Using index; End temporary   |
+----+-------------+---------+------------+------+---------------+------+---------+----------------+---------+----------+------------------------------+
2 rows in set, 1 warning (0.00 sec)    // so execution plan is semi join duplicateweedout 

MySQL [test]>  select count(*) from sbtest1 where k in (select a from sbtest2  );    // execution time is about 1hour44min, ibtmp1 keeps expanding

3. os detail
# cat /proc/version
Linux version 5.8.2-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), GNU ld version 2.32-16.el7) #1 SMP Tue Aug 18 18:38:17 EDT 2020

mount | grep vdb1
/dev/vdb1 on /data type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
[26 Nov 2020 7:30] MySQL Verification Team
Thank you Dean for the requested details.
I'll try with the provided conf and other details and will get back to you if further clarification needed.

regards,
Umesh
[11 Dec 2020 15:44] Dean Zhou
Hi Umesh, thank you for the test.

We need to confirm three things:
1. both set optimizer_switch = ... and the select count(*) ...  must be executed on the same connection. 
2. your sql plan seems right, but 5 min 12.63 sec execution time is impossible if the sql plan is semijoin duplicateweedout. can you set breakpoint on do_sj_dups_weedout function ?
3. the result of select count(*) ... must be 10000000 in my test case.

I also reproduce in MySQL 5.7.32, i uploaded show session variables result before executing select sql, so you can check the parameters.

-----------------  MySQL 5.7.32
-rw-r----- 1 mysql mysql  19G Dec 11 21:51 ibtmp1
[11 Dec 2020 15:46] Dean Zhou
show session variables

Attachment: session variables.txt (text/plain), 17.85 KiB.

[14 Dec 2020 11:00] MySQL Verification Team
Thank you for your patience, finally I was able to reproduce (host, server version and conf remain same. I didn't check structure etc but looks like downgrading sysbench from 1.1.0 to 1.0.20 did the trick - haven't checked if table structure varied etc). 
I'll be joining the activity logs shortly.

regards,
Umesh
[15 Dec 2020 7:58] MySQL Verification Team
MySQL Server 5.7.32 test results

Attachment: 101732_5.7.32.EL7_results.txt (text/plain), 144.04 KiB.