Bug #101732 ibtmp1 is very large in semijoin duplicate weedout strategy
Submitted: 24 Nov 2020 9:55 Modified: 14 Dec 2020 11:00
Reporter: Dean Zhou Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.7.30, 5.7.32, 8.0.21 OS:Any
Assigned to: CPU Architecture:Any
Tags: ibtmp1

[24 Nov 2020 9:55] Dean Zhou
We found that the SQL using semi join duplicateweedout strategy produce a very large temporary table space(ibtmp1), which does not meet our expectations. For example, a join of two 2GB tables will generate 19GB ibtmp1 space.

How to repeat:
1.  prepare 
   a) sysbench initializes 2 tables with 10 million rows
   b) alter table sbtest2 add a int default null;
   c) update sbtest1, sbtest2 set sbtest2.a= sbtest1.k where sbtest1.id = sbtest2.id;

3. execute sql
 set optimizer_switch  = 'firstmatch=off';
 set optimizer_switch  = 'materialization=off';            
 set max_heap_table_size = 16384;
 select count(*) from sbtest1 where k in (select a from sbtest2  );   //  You'll see that ibtmp1 keeps growing, and after SQL execution, ibtmp1 is about 19GB, execution time is about 1hour44min. 

Suggested fix:
the temp table insert stack as follows:

#0  row_insert_for_mysql_using_cursor (mysql_rec=0x7faa4001e9a0 "\377\004\211xf", prebuilt=0x7faa40942358) at /data/mysql-5.7.30/storage/innobase/row/row0mysql.cc:1561
#1  0x0000000000fd784e in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x7faa4001e9a0 "\377\004\211xf", prebuilt=<optimized out>) at /data/mysql-5.7.30/storage/innobase/row/row0mysql.cc:1864
#2  0x0000000000ef9419 in intrinsic_table_write_row (record=0x7faa4001e9a0 "\377\004\211xf", this=0x7faa40939248) at /data/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:7441
#3  ha_innobase::write_row (this=0x7faa40939248, record=0x7faa4001e9a0 "\377\004\211xf") at /data/mysql-5.7.30/storage/innobase/handler/ha_innodb.cc:7465
#4  0x00000000007eb947 in handler::ha_write_row (this=0x7faa40939248, buf=0x7faa4001e9a0 "\377\004\211xf") at /data/mysql-5.7.30/sql/handler.cc:8093
#5  0x0000000000c5785c in do_sj_dups_weedout (thd=0x7faa40000ae0, sjtbl=0x7faa40937a70) at /data/mysql-5.7.30/sql/sql_executor.cc:1432
#6  0x0000000000c57ada in evaluate_join_record (join=join@entry=0x7faa400105b0, qep_tab=qep_tab@entry=0x7faa409378f8) at /data/mysql-5.7.30/sql/sql_executor.cc:1602
#7  0x0000000000c5d08c in sub_select (join=0x7faa400105b0, qep_tab=0x7faa409378f8, end_of_records=<optimized out>) at /data/mysql-5.7.30/sql/sql_executor.cc:1304
#8  0x0000000000c57b05 in evaluate_join_record (join=join@entry=0x7faa400105b0, qep_tab=qep_tab@entry=0x7faa40937780) at /data/mysql-5.7.30/sql/sql_executor.cc:1652
#9  0x0000000000c5d08c in sub_select (join=0x7faa400105b0, qep_tab=0x7faa40937780, end_of_records=<optimized out>) at /data/mysql-5.7.30/sql/sql_executor.cc:1304
#10 0x0000000000c55e97 in do_select (join=0x7faa400105b0) at /data/mysql-5.7.30/sql/sql_executor.cc:957
#11 JOIN::exec (this=0x7faa400105b0) at /data/mysql-5.7.30/sql/sql_executor.cc:206
#12 0x0000000000cc335d in handle_query (thd=thd@entry=0x7faa40000ae0, lex=lex@entry=0x7faa40002c50, result=result@entry=0x7faa400075c0, added_options=added_options@entry=0, 
    removed_options=removed_options@entry=0) at /data/mysql-5.7.30/sql/sql_select.cc:191
#13 0x00000000007551e4 in execute_sqlcom_select (thd=thd@entry=0x7faa40000ae0, all_tables=<optimized out>) at /data/mysql-5.7.30/sql/sql_parse.cc:5155
#14 0x0000000000c886f9 in mysql_execute_command (thd=thd@entry=0x7faa40000ae0, first_level=first_level@entry=true) at /data/mysql-5.7.30/sql/sql_parse.cc:2826
#15 0x0000000000c8ad4d in mysql_parse (thd=thd@entry=0x7faa40000ae0, parser_state=parser_state@entry=0x7faa740e7660) at /data/mysql-5.7.30/sql/sql_parse.cc:5584
#16 0x0000000000c8b8b0 in dispatch_command (thd=thd@entry=0x7faa40000ae0, com_data=com_data@entry=0x7faa740e7cc0, command=COM_QUERY) at /data/mysql-5.7.30/sql/sql_parse.cc:1491
#17 0x0000000000c8d2c7 in do_command (thd=thd@entry=0x7faa40000ae0) at /data/mysql-5.7.30/sql/sql_parse.cc:1032
#18 0x0000000000d4acc8 in handle_connection (arg=arg@entry=0x13f19eb0) at /data/mysql-5.7.30/sql/conn_handler/connection_handler_per_thread.cc:313
#19 0x0000000001219591 in pfs_spawn_thread (arg=0x13fa5cc0) at /data/mysql-5.7.30/storage/perfschema/pfs.cc:2197
#20 0x00007fad1f565dd5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fad1d94b02d in clone () from /lib64/libc.so.6

The duplicateweedout strategy uses temporary table deduplication. create_duplicate_weedout_tmp_table function will create a temp table,  the ddl is "CREATE TABLE tmp (col VARBINARY(n) NOT NULL, UNIQUE KEY(col))", 
Innodb will automatically generate GEN_CLUST_INDEX for temporary tables because of the tmp table no primary key. In row_insert_for_mysql_using_cursor, first, insert the primary key record, and then insert the secondary unique index record. If the secondary index duplicates, the primary key index record will be delete-marked. The delete-marked records in the temporary table will not be reused. If the secondary unique index conflict is very serious, the ibmtp1 space will be greatly expanded.

create_duplicate_weedout_tmp_table {
	share->primary_key= MAX_KEY;   --> share->primary_key=  0 // add primary key, temporary tables can use the primary key to remove duplicates 

After the optimization, ibtmp1 space 19G --> 332M, execution time 1hour44min --> 44min

reuse the space of delete marked records in the temporary table when insert primary key record.

I am not very familiar with the code, there may be other optimization methods.
[24 Nov 2020 11:11] Dean Zhou
8.0.21 can also reproduce
[25 Nov 2020 12:22] MySQL Verification Team
Hello Dean Zhou,

Thank you for the report and feedback.
May I request you to please provide MySQL Server configuration details(my.cnf/my.ini - pls mark it as private after posting here), OS details and sysbench test used for preparing etc to reproduce this issue at our end. Thank you.

[26 Nov 2020 7:29] Dean Zhou
Hi Umesh,
Thank you for the feed.

1. test preparation
   a) sysbench initializes 2 tables with 10 million rows

    sysbench 1.0.20

    sysbench --mysql-user='user' --mysql-password='password' --mysql-db='test'  --mysql-host=host --mysql-port=5730  --tables=2 --time=100 --report-interval=1  --table_size=10000000  -
    -threads=10  oltp_insert.lua cleanup
    sysbench --mysql-user='user' --mysql-password='password' --mysql-db='test'  --mysql-host=host --mysql-port=5730  --tables=2 --time=100 --report-interval=1  --table_size=10000000  -
    -threads=10   oltp_insert.lua prepare

   b) alter table sbtest2 add a int default null;
   c) update sbtest1, sbtest2 set sbtest2.a= sbtest1.k where sbtest1.id = sbtest2.id;

2. execute sql
MySQL [(none)]> use test
Database changed
MySQL [test]>  set optimizer_switch  = 'firstmatch=off';
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  set optimizer_switch  = 'materialization=off';            
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  set max_heap_table_size = 16384;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]>  explain  select count(*) from sbtest1 where k in (select a from sbtest2  );
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref            | rows    | filtered | Extra                        |
|  1 | SIMPLE      | sbtest2 | NULL       | ALL  | NULL          | NULL | NULL    | NULL           | 9857202 |   100.00 | Using where; Start temporary |
|  1 | SIMPLE      | sbtest1 | NULL       | ref  | k_1           | k_1  | 4       | test.sbtest2.a |       6 |   100.00 | Using index; End temporary   |
2 rows in set, 1 warning (0.00 sec)    // so execution plan is semi join duplicateweedout 

MySQL [test]>  select count(*) from sbtest1 where k in (select a from sbtest2  );    // execution time is about 1hour44min, ibtmp1 keeps expanding

3. os detail
# cat /proc/version
Linux version 5.8.2-1.el7.elrepo.x86_64 (mockbuild@Build64R7) (gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), GNU ld version 2.32-16.el7) #1 SMP Tue Aug 18 18:38:17 EDT 2020

mount | grep vdb1
/dev/vdb1 on /data type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
[26 Nov 2020 7:30] MySQL Verification Team
Thank you Dean for the requested details.
I'll try with the provided conf and other details and will get back to you if further clarification needed.

[11 Dec 2020 15:44] Dean Zhou
Hi Umesh, thank you for the test.

We need to confirm three things:
1. both set optimizer_switch = ... and the select count(*) ...  must be executed on the same connection. 
2. your sql plan seems right, but 5 min 12.63 sec execution time is impossible if the sql plan is semijoin duplicateweedout. can you set breakpoint on do_sj_dups_weedout function ?
3. the result of select count(*) ... must be 10000000 in my test case.

I also reproduce in MySQL 5.7.32, i uploaded show session variables result before executing select sql, so you can check the parameters.

-----------------  MySQL 5.7.32
-rw-r----- 1 mysql mysql  19G Dec 11 21:51 ibtmp1
[11 Dec 2020 15:46] Dean Zhou
show session variables

Attachment: session variables.txt (text/plain), 17.85 KiB.

[14 Dec 2020 11:00] MySQL Verification Team
Thank you for your patience, finally I was able to reproduce (host, server version and conf remain same. I didn't check structure etc but looks like downgrading sysbench from 1.1.0 to 1.0.20 did the trick - haven't checked if table structure varied etc). 
I'll be joining the activity logs shortly.

[15 Dec 2020 7:58] MySQL Verification Team
MySQL Server 5.7.32 test results

Attachment: 101732_5.7.32.EL7_results.txt (text/plain), 144.04 KiB.