Bug #84774 | Performance drop every 60 seconds | ||
---|---|---|---|
Submitted: | 1 Feb 2017 14:53 | Modified: | 2 Feb 2017 13:18 |
Reporter: | Rene' Cannao' | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S5 (Performance) |
Version: | 5.7.17 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[1 Feb 2017 14:53]
Rene' Cannao'
[1 Feb 2017 14:55]
Rene' Cannao'
sysbench output
Attachment: bug84774.txt (text/plain), 39.84 KiB.
[2 Feb 2017 13:18]
MySQL Verification Team
Hello Rene, Thank you for the report and feedback! Thanks, Umesh
[3 Feb 2017 11:38]
Nuno Carvalho
Posted by developer: Hi Rene, Thank you for evaluating Group Replication, your (and all community feedback) is important! Like you did suggest, Group Replication performs maintenance at a regular interval, more precisely each 60 seconds. Our performance results show that, section 2.4. Stability over time at http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/ More precisely, every 60 seconds every member exchange its persisted transactions set and the intersection of these sets is used to garbage collect the certification info that each member maintains. On write intensive workloads, like yours, this operation can be longer than expected. We have plans to improve this. Best regards, Nuno Carvalho
[13 Jun 2018 17:10]
Vadim Tkachenko
I have more severe stalls under sysbench-tpcc workload. I use Google Cloud n1-highmem-16 or 32 instances to repeat this problem. Under sysbench-tpcc, 10 tables, scale 100 every 60 sec I see the stall for 5-6 seconds (throughput drops to 0). More severe problems when I restart the whole cluster (3 nodes). In this case the stalls progressively increase from 5 sec to 40 sec during 10 hours run. More details about my setup: Terraform files to deploy Google Cloud nodes: https://github.com/vadimtk/gce-ansible/blob/master/terraform/group-replication/main.tf Ansible files to deploy Group Replication: https://github.com/vadimtk/gce-ansible/tree/master/group-replication The example of stalls I observe you can find there: https://raw.githubusercontent.com/Percona-Lab-results/201805-sysbench-tpcc-group-repl/mast...
[31 Jan 2019 17:42]
Vinicius Malvestio Grippa
Same on MySQL 5.7.20: The sysbench execution: # Prepare sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \ --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \ --table_size=10000 --tables=200 --threads=1 --events=0 --time=60 \ --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare # Execution sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \ --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \ --table_size=10000 --tables=200 --threads=5 --events=0 --time=6000 \ --rand-type=uniform /usr/share/sysbench/oltp_read_write.lua --report-interval=1 run We can see the GR stalling for briefly moments: [ 65s ] thds: 5 tps: 386.92 qps: 7727.45 (r/w/o: 5411.92/585.40/1730.13) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00 [ 66s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 67s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 68s ] thds: 5 tps: 25.00 qps: 489.01 (r/w/o: 340.01/33.00/116.00) lat (ms,95%): 2449.36 err/s: 0.00 reconn/s: 0.00 [ 69s ] thds: 5 tps: 226.93 qps: 4528.61 (r/w/o: 3165.03/312.90/1050.68) lat (ms,95%): 130.13 err/s: 0.00 reconn/s: 0.00 [ 121s ] thds: 5 tps: 212.03 qps: 4228.55 (r/w/o: 2961.39/300.04/967.13) lat (ms,95%): 12.52 err/s: 0.00 reconn/s: 0.00 [ 122s ] thds: 5 tps: 680.12 qps: 13592.38 (r/w/o: 9522.67/1018.18/3051.53) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00 [ 123s ] thds: 5 tps: 0.00 qps: 28.00 (r/w/o: 8.00/10.00/10.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 124s ] thds: 5 tps: 19.00 qps: 310.00 (r/w/o: 216.00/22.00/72.00) lat (ms,95%): 1589.90 err/s: 0.00 reconn/s: 0.00 [ 167s ] thds: 5 tps: 406.17 qps: 8087.32 (r/w/o: 5662.33/611.25/1813.75) lat (ms,95%): 5.09 err/s: 0.00 reconn/s: 0.00 [ 168s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00 [ 169s ] thds: 5 tps: 491.96 qps: 9857.14 (r/w/o: 6895.40/756.93/2204.81) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[11 Oct 2021 13:35]
Anibal Pinto
Posted by developer: This isn't a bug but a behavior of group replication. ## How Group Replication certifier Garbage collector work When a transaction is executed on a Group Replication member, the certifier checks if it exists in certification info and if not stores its write set. Periodically after every 60 second GR members exchange their applied GTIDs with all group members. The stable set which is intersection of these applied GTIDs for all the members is calculated. The certification garbage collector picks the stable set and validates if write sets on certification info are contained in it, if they do exists they are removed from certification info. When garbage collector is executed all transactions certification is put on hold, except execution stages before and after certification. ## Why a heavy drop on transactions per second If certification info starts to grow, so it will the time need to run the garbage collector. The increase of certification info can be due an high load of transactions on the group or a small stable set of the transactions processed by members of the group. The value of certification info size can be tracked on COUNT_TRANSACTIONS_ROWS_VALIDATING column from performance_schema.replication_group_member_stats table. The value can be thought of as the current size of the conflict detection database against which each transaction is certified. On this case we detected secondaries not being able to keep processing transactions as fast as primary and group generate a small stable set. This was observed on the increase of values in columns COUNT_TRANSACTIONS_IN_QUEUE, number of transactions in the queue pending conflict detection, and COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE, number of transactions received from replication group which are waiting to be applied, from performance_schema.replication_group_member_stats table. ## Solution Secondaries shall keep a small gap of transactions from primary members, flow control shall be configured to restrain amount of transactions waiting for certification on variable **group_replication_flow_control_certifier_threshold** and **group_replication_flow_control_applier_threshold**. This will allow group to have a larger stable set that will allow to remove more write sets from certification info on each iteration, helping garbage collector being more efficient.
[12 Sep 2022 20:56]
Yves Trudeau
I understand these performance drops are a normal "behavior" of group replication but it is a very annoying behavior for application that have strict SLAs. Furthermore, it is a behavior that can easily be fixed. The garbarge collector in the certifier has to clean up the certification info list while holding a lock on the list and a write lock on the gtids. With millions of trx in the certification info, no wonder there is a latency spike reaching a second or more. I wrote a simple patch (against 8.0.30) that breaks this process into chunks and releases the locks during a short sleep time between chunks to allow some trx to interleave and thus lower the latency to avoid missing SLAs. Both the chunk size and the sleep time are dynamic MySQL variables, the default values are performing well with sysbench. The patch also increase the chunk size over time if the chunk size was initially set too low. With the default values, the top latencies more than one order of magnitude smaller.
[12 Sep 2022 20:57]
Yves Trudeau
Patch to the Certifier garbage collector to process in chunks
Attachment: patch_gr_cert_loop_on_8_0_30.diff (text/x-patch), 7.44 KiB.
[13 Sep 2022 17:20]
Yves Trudeau
Just to further illustrate, I used a modified sysbench version that allow to specify multiple percentages and closer to 100% (sysbench is normally limited to 99%). Here's a example of a latency spike caused by the garbage collector: 56s ] thds: 14 tps: 6629.23 qps: 6629.23 (r/w/o: 0.00/6629.23/0.00) lat (ms,99.50%): 4.74 lat (ms,99.90%): 7.04 lat (ms,99.99%): 11.24 err/s: 0.00 reconn/s: 0.00 [ 57s ] thds: 14 tps: 6506.56 qps: 6506.56 (r/w/o: 0.00/6506.56/0.00) lat (ms,99.50%): 5.67 lat (ms,99.90%): 7.43 lat (ms,99.99%): 9.22 err/s: 0.00 reconn/s: 0.00 [ 58s ] thds: 14 tps: 4806.52 qps: 4806.52 (r/w/o: 0.00/4806.52/0.00) lat (ms,99.50%): 6.43 lat (ms,99.90%): 248.83 lat (ms,99.99%): 248.83 err/s: 0.00 reconn/s: 0.00 [ 59s ] thds: 14 tps: 6389.67 qps: 6389.67 (r/w/o: 0.00/6389.67/0.00) lat (ms,99.50%): 5.09 lat (ms,99.90%): 5.77 lat (ms,99.99%): 9.06 err/s: 0.00 reconn/s: 0.00 [ 60s ] thds: 14 tps: 6587.52 qps: 6587.52 (r/w/o: 0.00/6587.52/0.00) lat (ms,99.50%): 4.82 lat (ms,99.90%): 6.21 lat (ms,99.99%): 6.55 err/s: 0.00 reconn/s: 0.00 It appends exactly every minute, when the garbage collection is running. As one can lee, the 99.5% value is not really affected but, 99.9% and 99.99% clearly show the problem with latencies around 30x larger.
[5 Nov 3:01]
Bin Wang
This issue is related to certification database cleanup, and it's particularly noticeable on high-end machines. For more details, you can refer to my book "The Art of Problem-Solving in Software Engineering: How to Make MySQL Better". I've also made extensive modifications to Group Replication, and if you're interested, you can check out my version at https://github.com/advancedmysql/mysqlplus. My version does not have this issue.