Bug #104615 trx commit slow if there are too many binlog files to purge while rotating
Submitted: 13 Aug 2021 10:26 Modified: 13 Aug 2021 11:32
Reporter: Brian Yue (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Logging Severity:S5 (Performance)
Version:8.0 OS:Any (rhel7.4)
Assigned to: CPU Architecture:Any (intel-x86)
Tags: commit, performance, purge, rotate

[13 Aug 2021 10:26] Brian Yue
Description:
Hello, dear verification team:

  Here I find a case that transactions may commit slow. If we have already writen many binlog files (e.g. 20GB binlog files), and we now change configuration `binlog_expire_logs_seconds` as 1 dynamically, when the next binlog rotation happens, the transaction which is doing purge will waste too much time on purging expired binlog files, which I think is an issue of performance and should be improved.
  With 28GB binlog files on NVME disk to purge, a 6.317686 sec slow query log is produced.

  Please reference to `How to repeat` for detail.
  

How to repeat:
(1) turn on `slow_query_log` and modify `long_query_time` as `0.5`

(2) produce many binlog files with sysbench, up to more than 20GB (I produced binlog files of 28GB size) with oltp.lua of sysbench (very small transactions)

[yxx_sysbench_1@localhost sysbench]$ ./sysbench --test=./db/oltp.lua --oltp-auto-inc=off --oltp-table-size=1000000 --oltp-tables-count=2 --mysql-table-engine=innodb --mysql-user=root  --mysql-password='db10$ZTE' --mysql-port=6324 --mysql-host=192.168.10.25 --mysql-db=sbtest --max-requests=0 --max-time=38800 --num-threads=200 --report-interval=60 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 200
Report intermediate results every 60 second(s)
Random number generator seed is 0 and will be ignored

Threads started!

[  60s] threads: 200, tps: 9267.25, reads/s: 129779.24, writes/s: 37078.51, response time: 36.61ms (95%)
[ 120s] threads: 200, tps: 9561.10, reads/s: 133857.07, writes/s: 38241.61, response time: 36.44ms (95%)
[ 180s] threads: 200, tps: 9588.68, reads/s: 134238.90, writes/s: 38354.73, response time: 36.23ms (95%)
[ 240s] threads: 200, tps: 9534.50, reads/s: 133493.20, writes/s: 38144.37, response time: 36.36ms (95%)
[ 300s] threads: 200, tps: 9571.07, reads/s: 133987.23, writes/s: 38277.25, response time: 36.10ms (95%)
[ 360s] threads: 200, tps: 9646.77, reads/s: 135056.62, writes/s: 38585.16, response time: 36.03ms (95%)
[ 420s] threads: 200, tps: 9637.35, reads/s: 134922.53, writes/s: 38550.22, response time: 35.95ms (95%)
[ 480s] threads: 200, tps: 9606.92, reads/s: 134484.78, writes/s: 38427.08, response time: 36.14ms (95%)
[ 540s] threads: 200, tps: 9635.87, reads/s: 134918.59, writes/s: 38547.80, response time: 36.07ms (95%)
[ 600s] threads: 200, tps: 9665.53, reads/s: 135308.70, writes/s: 38657.46, response time: 35.85ms (95%)
[ 660s] threads: 200, tps: 9544.77, reads/s: 133626.30, writes/s: 38179.42, response time: 36.48ms (95%)
[ 720s] threads: 200, tps: 9616.50, reads/s: 134633.91, writes/s: 38465.24, response time: 35.95ms (95%)
[ 780s] threads: 200, tps: 9658.38, reads/s: 135207.50, writes/s: 38635.58, response time: 35.94ms (95%)
[ 840s] threads: 200, tps: 9635.87, reads/s: 134915.54, writes/s: 38543.85, response time: 36.12ms (95%)
[ 900s] threads: 200, tps: 9636.08, reads/s: 134898.17, writes/s: 38542.95, response time: 35.96ms (95%)
^C

Every 2.0s: du -sh ../binlog                                                                                                                     Fri Aug 13 17:49:03 2021

28G     ../binlog

(3) modify `binlog_expire_logs_seconds` as `1` dynamically:
mysql> set @@global.binlog_expire_logs_seconds = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select NOW();
+---------------------+
| NOW()               |
+---------------------+
| 2021-08-13 18:08:55 |
+---------------------+
1 row in set (0.00 sec)

(4) slow query log is produced at the next binlog rotation, which is very slow (6.317686 sec):

# Time: 2021-08-13T18:09:39.961767+08:00
# User@Host: root[root] @  [192.168.10.40]  Id:    25
# Query_time: 6.317686  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
SET timestamp=1628849373;
COMMIT;

Suggested fix:
I think we can make a single rotation_thread to do binlog purge, instead of worker threads which are executing sql commands.
[13 Aug 2021 11:32] MySQL Verification Team
Hi Mr. Yue,

We have done a test case, as you have recommended, and we have also experienced the slowdown, which is measurable.

We find that your performance improvement observation and the manner for fixing the problem with log rotations do, indeed, have a lot of merit. That is why your proposal is approved.

Verified as reported.