Bug #84774 Performance drop every 60 seconds
Submitted: 1 Feb 2017 14:53 Modified: 2 Feb 2017 13:18
Reporter: Rene' Cannao' Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S5 (Performance)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any

[1 Feb 2017 14:53] Rene' Cannao'
Description:
While running a write intensive workload on a Group Replication cluster, every 60 seconds throughput drastically drops to immediately returns back to normal.

How to repeat:
Setup a Group Replication cluster: the size doesn't seem to be relevant, it happens with both 3 and 5 nodes.
Run a write intensive workload on a single writer using sysbench. Example:

./sysbench --num-threads=16 --max-time=300 --max-requests=0 --test=./lua/oltp_update_index.lua --mysql-user=sbtest --mysql-password=sbtest --mysql-host=10.1.2.42 --mysql-port=5717 --oltp-table-size=10000000 --oltp-tables-count=8 --report-interval=1 --oltp-auto-inc=off run

See attached output.

Suggested fix:
I haven't check what triggers the bug, therefore I don't have a lot of context to suggest a fix.
Assuming that Group Replication performs some maintenance/housekeeping at regular interval, this maintenance should less aggressive.
[1 Feb 2017 14:55] Rene' Cannao'
sysbench output

Attachment: bug84774.txt (text/plain), 39.84 KiB.

[2 Feb 2017 13:18] Umesh Shastry
Hello Rene,

Thank you for the report and feedback!

Thanks,
Umesh
[3 Feb 2017 11:38] Nuno Carvalho
Posted by developer:
 
Hi Rene,

Thank you for evaluating Group Replication, your (and all community feedback) is important!

Like you did suggest, Group Replication performs maintenance at a regular interval, more precisely each 60 seconds.
Our performance results show that, section 2.4. Stability over time at
http://mysqlhighavailability.com/performance-evaluation-mysql-5-7-group-replication/

More precisely, every 60 seconds every member exchange its persisted transactions set and the intersection of these sets is used to garbage collect the certification info that each member maintains. On write intensive workloads, like yours, this operation can be longer than expected.
We have plans to improve this.

Best regards,
Nuno Carvalho
[13 Jun 2018 17:10] Vadim Tkachenko
I have more severe stalls under sysbench-tpcc workload.
I use Google Cloud n1-highmem-16 or 32 instances to repeat this problem.

Under sysbench-tpcc, 10 tables, scale 100 every 60 sec I see the stall for 5-6 seconds (throughput drops to 0).

More severe problems when I restart the whole cluster (3 nodes). In this case the stalls progressively increase from 5 sec to 40 sec during 10 hours run.

More details about my setup:
Terraform files to deploy Google Cloud nodes:
https://github.com/vadimtk/gce-ansible/blob/master/terraform/group-replication/main.tf

Ansible files to deploy Group Replication:
https://github.com/vadimtk/gce-ansible/tree/master/group-replication

The example of stalls I observe you can find there:
https://raw.githubusercontent.com/Percona-Lab-results/201805-sysbench-tpcc-group-repl/mast...
[31 Jan 17:42] Vinicius Malvestio Grippa
Same on MySQL 5.7.20:

The sysbench execution:

# Prepare

sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \

  --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \

  --table_size=10000 --tables=200 --threads=1 --events=0 --time=60 \

  --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua prepare

# Execution

  sysbench --db-driver=mysql --mysql-user=root --mysql-password=msandbox \

    --mysql-socket=/tmp/mysql_sandbox49008.sock --mysql-db=test --range_size=100 \

    --table_size=10000 --tables=200 --threads=5 --events=0 --time=6000 \

    --rand-type=uniform /usr/share/sysbench/oltp_read_write.lua --report-interval=1 run

We can see the GR stalling for briefly moments:

[ 65s ] thds: 5 tps: 386.92 qps: 7727.45 (r/w/o: 5411.92/585.40/1730.13) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00

[ 66s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 67s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00‚Äč

[ 68s ] thds: 5 tps: 25.00 qps: 489.01 (r/w/o: 340.01/33.00/116.00) lat (ms,95%): 2449.36 err/s: 0.00 reconn/s: 0.00

[ 69s ] thds: 5 tps: 226.93 qps: 4528.61 (r/w/o: 3165.03/312.90/1050.68) lat (ms,95%): 130.13 err/s: 0.00 reconn/s: 0.00

[ 121s ] thds: 5 tps: 212.03 qps: 4228.55 (r/w/o: 2961.39/300.04/967.13) lat (ms,95%): 12.52 err/s: 0.00 reconn/s: 0.00

[ 122s ] thds: 5 tps: 680.12 qps: 13592.38 (r/w/o: 9522.67/1018.18/3051.53) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00

[ 123s ] thds: 5 tps: 0.00 qps: 28.00 (r/w/o: 8.00/10.00/10.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 124s ] thds: 5 tps: 19.00 qps: 310.00 (r/w/o: 216.00/22.00/72.00) lat (ms,95%): 1589.90 err/s: 0.00 reconn/s: 0.00

[ 167s ] thds: 5 tps: 406.17 qps: 8087.32 (r/w/o: 5662.33/611.25/1813.75) lat (ms,95%): 5.09 err/s: 0.00 reconn/s: 0.00

[ 168s ] thds: 5 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

[ 169s ] thds: 5 tps: 491.96 qps: 9857.14 (r/w/o: 6895.40/756.93/2204.81) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00