Bug #102143 | workload stalls while running truncation with large buffer pool | ||
---|---|---|---|
Submitted: | 5 Jan 2021 3:06 | Modified: | 14 May 2021 12:40 |
Reporter: | zhai weixiang (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 8.0 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[5 Jan 2021 3:06]
zhai weixiang
[6 Jan 2021 8:30]
MySQL Verification Team
Hello Zhai, Thank you for the report and feedback. May I request you to please provide sysbench test details(unaltered sysbench command with all the options used), server configuration file, server version details and statements which triggered stall etc to reproduce the issue at our end? Thank you. regards, Umesh
[6 Jan 2021 9:02]
zhai weixiang
sysbench command: 200 tables, each has 10000000 records sysbench oltp_update_non_index.lua --threads=200 --time=3000 --mysql-user=$user --mysql-password=$password --mysql-host=$host --mysql-port=$port --mysql-db=sysbench --report-interval=1 --tables=200 --table-size=10000000 --events=0 --forced-shutdown=off run undo truncation is turn on. buffer pool configuration: innodb_buffer_pool_chunk_size=32M innodb_buffer_pool_size = 500000M innodb_buffer_pool_instances = 32 Actually after analyzing the code, I think it should be safe to invoke trx_undo_truncate_tablespace without holding undo::spaces->x_lock(). because the undo tablespace to be truncated is marked as inactive and no user thread will use rsegs of the space (ref code in function get_next_redo_rseg_from_undo_spaces)
[7 Jan 2021 2:37]
zhai weixiang
btw. I'm using 8.0.18 version. and the version after (WL#11819: InnoDB: Support ACID Undo DDL) should suffer less influence. But may still be affected as the truncation process is holding x lock
[7 Jan 2021 9:46]
MySQL Verification Team
Thank you for the feedback. I tried on 8.0.22 release build with the moderate buffer pool(96G) considering test box capacity: rm -rf 102143/ bin/mysqld --initialize-insecure --basedir=$PWD --datadir=$PWD/102143 --log-error-verbosity=3 bin/mysqld --no-defaults --basedir=$PWD --datadir=$PWD/102143 --core-file --socket=/tmp/mysql_ushastry.sock --port=3333 --log-error=$PWD/102143/log.err --mysqlx-port=33330 --mysqlx-socket=/tmp/mysql_x_ushastry.sock --log-error-verbosity=3 --secure-file-priv=/tmp/ --innodb_buffer_pool_chunk_size=32M --innodb_buffer_pool_size=96G --innodb_buffer_pool_instances=32 --skip-log-bin 2>&1 & - prepare bin/sysbench --threads=200 --mysql-user=root --mysql-socket=/tmp/mysql_ushastry.sock --mysql-db=sysbench --report-interval=1 --tables=200 --table-size=10000000 --events=0 --forced-shutdown=off ./share/sysbench/oltp_update_non_index.lua prepare + du -h 102143/sysbench/ 454G 102143/sysbench/ - run (increased max connections 1000) bin/sysbench --threads=500 --time=3000 --mysql-user=root --mysql-socket=/tmp/mysql_ushastry.sock --mysql-db=sysbench --report-interval=1 --tables=200 --table-size=10000000 --events=0 --forced-shutdown=off ./share/sysbench/oltp_update_non_index.lua run In between: CREATE UNDO TABLESPACE tablespace_name ADD DATAFILE 'file_name.ibu'; ALTER UNDO TABLESPACE tablespace_name SET INACTIVE; Observed decrease in tps: 777s ] thds: 500 tps: 544.82 qps: 544.82 (r/w/o: 0.00/544.82/0.00) lat (ms,95%): 1938.16 err/s: 0.00 reconn/s: 0.00 [ 778s ] thds: 500 tps: 590.21 qps: 590.21 (r/w/o: 0.00/590.21/0.00) lat (ms,95%): 1109.09 err/s: 0.00 reconn/s: 0.00 [ 779s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 780s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 781s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 782s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 783s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 784s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 785s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 786s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 787s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 788s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 789s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 790s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 791s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 792s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 793s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 794s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 795s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 796s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 797s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 798s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 799s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 800s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 801s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 802s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 803s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 804s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 805s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 806s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 807s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 808s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 809s ] thds: 500 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 810s ] thds: 500 tps: 1199.01 qps: 1199.01 (r/w/o: 0.00/1199.01/0.00) lat (ms,95%): 32161.14 err/s: 0.00 reconn/s: 0.00 [ 811s ] thds: 500 tps: 942.74 qps: 942.74 (r/w/o: 0.00/942.74/0.00) lat (ms,95%): 1213.57 err/s: 0.00 reconn/s: 0.00 [ 812s ] thds: 500 tps: 658.16 qps: 658.16 (r/w/o: 0.00/658.16/0.00) lat (ms,95%): 1069.86 err/s: 0.00 reconn/s: 0.00 [ 813s ] thds: 500 tps: 920.00 qps: 920.00 (r/w/o: 0.00/920.00/0.00) lat (ms,95%): 1149.76 err/s: 0.00 reconn/s: 0.00 [ 814s ] thds: 500 tps: 1102.81 qps: 1102.81 (r/w/o: 0.00/1102.81/0.00) lat (ms,95%): 1032.01 err/s: 0.00 reconn/s: 0.00 [ 815s ] thds: 500 tps: 1448.62 qps: 1448.62 (r/w/o: 0.00/1448.62/0.00) lat (ms,95%): 960.30 err/s: 0.00 reconn/s: 0.00 [ 816s ] thds: 500 tps: 1086.39 qps: 1086.39 (r/w/o: 0.00/1086.39/0.00) lat (ms,95%): 861.95 err/s: 0.00 reconn/s: 0.00 Looking at "Performance Impact of Truncating Undo Tablespace Files" in https://dev.mysql.com/doc/refman/8.0/en/innodb-undo-tablespaces.html it seems to be expected and also didn't observe the stacktraces which you pointed in the report. Most of the issues seems to be handled as part of WL#11819(in 8.0.21) but since you have pointed rightly about truncation process holding x lock and hence verifying so that further it is improved.
[14 May 2021 12:40]
Daniel Price
Posted by developer: Fixed as of the upcoming 8.0.26 release, and here's the proposed changelog entry from the documentation team: A workload stalled while executing a undo tablespace truncation operation on an instance with a large buffer pool. The function that truncates marked undo tablespaces now takes a shared latch instead of an exclusive latch, and the shared latch is taken for a shorter period of time.