Bug #84774 Performance drop every 60 seconds
Submitted: 1 Feb 2017 14:53 Modified: 2 Feb 2017 13:18
Reporter: Rene' Cannao' Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S5 (Performance)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any

[1 Feb 2017 14:53] Rene' Cannao'
Description:
While running a write intensive workload on a Group Replication cluster, every 60 seconds throughput drastically drops to immediately returns back to normal.

How to repeat:
Setup a Group Replication cluster: the size doesn't seem to be relevant, it happens with both 3 and 5 nodes.
Run a write intensive workload on a single writer using sysbench. Example:

./sysbench --num-threads=16 --max-time=300 --max-requests=0 --test=./lua/oltp_update_index.lua --mysql-user=sbtest --mysql-password=sbtest --mysql-host=10.1.2.42 --mysql-port=5717 --oltp-table-size=10000000 --oltp-tables-count=8 --report-interval=1 --oltp-auto-inc=off run

See attached output.

Suggested fix:
I haven't check what triggers the bug, therefore I don't have a lot of context to suggest a fix.
Assuming that Group Replication performs some maintenance/housekeeping at regular interval, this maintenance should less aggressive.
[1 Feb 2017 14:55] Rene' Cannao'
sysbench output

Attachment: bug84774.txt (text/plain), 39.84 KiB.

[2 Feb 2017 13:18] MySQL Verification Team
Hello Rene,

Thank you for the report and feedback!

Thanks,
Umesh
[3 Feb 2017 11:38] Nuno Carvalho
Posted by developer:
 
Hi Rene,

Thank you for evaluating Group Replication, your (and all community feedback) is important!

Like you did suggest, Group Replication performs maintenance at a regular interval, more precisely each 60 seconds.
Our performance results show that, section 2.4. Stability over time at
http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

More precisely, every 60 seconds every member exchange its persisted transactions set and the intersection of these sets is used to garbage collect the certification info that each member maintains. On write intensive workloads, like yours, this operation can be longer than expected.
We have plans to improve this.

Best regards,
Nuno Carvalho
[13 Jun 2018 17:10] Vadim Tkachenko
I have more severe stalls under sysbench-tpcc workload.
I use Google Cloud n1-highmem-16 or 32 instances to repeat this problem.

Under sysbench-tpcc, 10 tables, scale 100 every 60 sec I see the stall for 5-6 seconds (throughput drops to 0).

More severe problems when I restart the whole cluster (3 nodes). In this case the stalls progressively increase from 5 sec to 40 sec during 10 hours run.

More details about my setup:
Terraform files to deploy Google Cloud nodes:
https://github.com/vadimtk/gce-ansible/blob/master/terraform/group-replication/main.tf

Ansible files to deploy Group Replication:
https://github.com/vadimtk/gce-ansible/tree/master/group-replication

The example of stalls I observe you can find there:
https://raw.githubusercontent.com/Percona-Lab-results/201805-sysbench-tpcc-group-repl/mast...
[31 Jan 2019 17:42] Vinicius Malvestio Grippa
Same on MySQL 5.7.20:

The sysbench execution:

# Prepare

sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \

  --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \

  --table_size=10000 --tables=200 --threads=1 --events=0 --time=60 \

  --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

# Execution

  sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \

    --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \

    --table_size=10000 --tables=200 --threads=5 --events=0 --time=6000 \

    --rand-type=uniform /usr/share/sysbench/oltp_read_write.lua --report-interval=1 run

We can see the GR stalling for briefly moments:

[ 65s ] thds: 5 tps: 386.92 qps: 7727.45 (r/w/o: 5411.92/585.40/1730.13) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00

[ 66s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 67s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00‚Äč

[ 68s ] thds: 5 tps: 25.00 qps: 489.01 (r/w/o: 340.01/33.00/116.00) lat (ms,95%): 2449.36 err/s: 0.00 reconn/s: 0.00

[ 69s ] thds: 5 tps: 226.93 qps: 4528.61 (r/w/o: 3165.03/312.90/1050.68) lat (ms,95%): 130.13 err/s: 0.00 reconn/s: 0.00

[ 121s ] thds: 5 tps: 212.03 qps: 4228.55 (r/w/o: 2961.39/300.04/967.13) lat (ms,95%): 12.52 err/s: 0.00 reconn/s: 0.00

[ 122s ] thds: 5 tps: 680.12 qps: 13592.38 (r/w/o: 9522.67/1018.18/3051.53) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00

[ 123s ] thds: 5 tps: 0.00 qps: 28.00 (r/w/o: 8.00/10.00/10.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 124s ] thds: 5 tps: 19.00 qps: 310.00 (r/w/o: 216.00/22.00/72.00) lat (ms,95%): 1589.90 err/s: 0.00 reconn/s: 0.00

[ 167s ] thds: 5 tps: 406.17 qps: 8087.32 (r/w/o: 5662.33/611.25/1813.75) lat (ms,95%): 5.09 err/s: 0.00 reconn/s: 0.00

[ 168s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 169s ] thds: 5 tps: 491.96 qps: 9857.14 (r/w/o: 6895.40/756.93/2204.81) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[11 Oct 2021 13:35] Anibal Pinto
Posted by developer:
 
This isn't a bug but a behavior of group replication.

## How Group Replication certifier Garbage collector work

When a transaction is executed on a Group Replication member, the certifier
checks if it exists in certification info and if not stores its write set.

Periodically after every 60 second GR members exchange their applied GTIDs with
all group members. The stable set which is intersection of these applied GTIDs
for all the members is calculated.

The certification garbage collector picks the stable set and validates if
write sets on certification info are contained in it, if they do exists
they are removed from certification info.

When garbage collector is executed all transactions certification is put on
hold, except execution stages before and after certification.

## Why a heavy drop on transactions per second

If certification info starts to grow, so it will the time need to run the
garbage collector.

The increase of certification info can be due an high load of transactions on
the group or a small stable set of the transactions processed by members of the
group.

The value of certification info size can be tracked on
COUNT_TRANSACTIONS_ROWS_VALIDATING column from
performance_schema.replication_group_member_stats table. The value can be
thought of as the current size of the conflict detection database against which
each transaction is certified.

On this case we detected secondaries not being able to keep processing
transactions as fast as primary and group generate a small stable set.

This was observed on the increase of values in columns
COUNT_TRANSACTIONS_IN_QUEUE, number of transactions in the queue pending
conflict detection, and COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE, number of
transactions received from replication group which are waiting to be applied,
from performance_schema.replication_group_member_stats table.

## Solution

Secondaries shall keep a small gap of transactions from primary members, flow
control shall be configured to restrain amount of transactions waiting for
certification on variable
**group_replication_flow_control_certifier_threshold** and
**group_replication_flow_control_applier_threshold**.

This will allow group to have a larger stable set that will allow to remove
more write sets from certification info on each iteration, helping garbage
collector being more efficient.