Bug #84773 Flow control slows down throughput while a node join the cluster
Submitted: 1 Feb 2017 14:29 Modified: 4 Oct 2017 14:14
Reporter: Rene' Cannao' Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S5 (Performance)
Version:5.7.17 OS:Linux
Assigned to: CPU Architecture:Any

[1 Feb 2017 14:29] Rene' Cannao'
Description:
When a node joins a group replication cluster, during the recovering phase it can drastically affect the performance of the whole cluster.

It seems caused by flow control.

How to repeat:
Steps to reproduce the issue:
* start a cluster with 4 nodes, using one single writer
* run sysbench for a write intensive workload, for example:
./sysbench --num-threads=16 --max-time=3600 --max-requests=0 --test=./lua/oltp_update_index.lua --report-interval=1 ... run
* after several minutes that sysbench is running, in one of the node (not the writer) run STOP GROUP_REPLICATION
* wait several minutes, and on the same node where GR was stopped run START GROUP_REPLICATION

Result:
Within few seconds, the throughput reported by sysbench will drop drastically until the node is in sync.

Below an example:
$ ./sysbench --num-threads=16 --max-time=300 --max-requests=0 --test=./lua/oltp_update_index.lua --mysql-user=sbtest --mysql-password=sbtest --mysql-host=10.1.2.42 --mysql-port=5717 --oltp-table-size=10000000 --oltp-tables-count=8 --report-interval=3 --oltp-auto-inc=off run
sysbench 1.0 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 16
Report intermediate results every 3 second(s)
Initializing random number generator from current time

Initializing worker threads...

Threads started!

[   3s] threads: 16 tps: 11366.46 qps: 11366.46 (r/w/o: 0.00/11366.46/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[   6s] threads: 16 tps: 11391.58 qps: 11391.58 (r/w/o: 0.00/11391.58/0.00) latency: 1.70ms (95%) errors/s: 0.00 reconnects/s: 0.00
[   9s] threads: 16 tps: 10960.68 qps: 10960.68 (r/w/o: 0.00/10960.68/0.00) latency: 1.70ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  12s] threads: 16 tps: 11365.37 qps: 11365.37 (r/w/o: 0.00/11365.37/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  15s] threads: 16 tps: 11058.31 qps: 11058.31 (r/w/o: 0.00/11058.31/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  18s] threads: 16 tps: 11441.39 qps: 11441.39 (r/w/o: 0.00/11441.39/0.00) latency: 1.64ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  21s] threads: 16 tps: 11300.63 qps: 11300.63 (r/w/o: 0.00/11300.63/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  24s] threads: 16 tps: 11370.56 qps: 11370.56 (r/w/o: 0.00/11370.56/0.00) latency: 1.64ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  27s] threads: 16 tps: 11346.35 qps: 11346.35 (r/w/o: 0.00/11346.35/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  30s] threads: 16 tps: 11055.72 qps: 11055.72 (r/w/o: 0.00/11055.72/0.00) latency: 1.70ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  33s] threads: 16 tps: 11365.68 qps: 11365.68 (r/w/o: 0.00/11365.68/0.00) latency: 1.67ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  36s] threads: 16 tps: 10576.29 qps: 10576.29 (r/w/o: 0.00/10576.29/0.00) latency: 1.64ms (95%) errors/s: 0.00 reconnects/s: 0.00    <== here START GROUP_REPLICATION is executed
[  39s] threads: 16 tps: 4815.00 qps: 4815.00 (r/w/o: 0.00/4815.00/0.00) latency: 2.39ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  42s] threads: 16 tps: 10383.06 qps: 10383.06 (r/w/o: 0.00/10383.06/0.00) latency: 1.82ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  45s] threads: 16 tps: 9340.62 qps: 9340.62 (r/w/o: 0.00/9340.62/0.00) latency: 1.82ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  48s] threads: 16 tps: 6893.66 qps: 6893.66 (r/w/o: 0.00/6893.66/0.00) latency: 1.86ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  51s] threads: 16 tps: 4464.33 qps: 4464.33 (r/w/o: 0.00/4464.33/0.00) latency: 1.86ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  54s] threads: 16 tps: 2296.67 qps: 2296.67 (r/w/o: 0.00/2296.67/0.00) latency: 1.93ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  57s] threads: 16 tps: 1673.35 qps: 1673.35 (r/w/o: 0.00/1673.35/0.00) latency: 1.89ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  60s] threads: 16 tps: 1229.99 qps: 1229.99 (r/w/o: 0.00/1229.99/0.00) latency: 2.03ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  63s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.00ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  66s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.07ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  69s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.07ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  72s] threads: 16 tps: 1124.99 qps: 1124.99 (r/w/o: 0.00/1124.99/0.00) latency: 2.66ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  75s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.52ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  78s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.00ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  81s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.03ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  84s] threads: 16 tps: 1125.01 qps: 1125.01 (r/w/o: 0.00/1125.01/0.00) latency: 2.11ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  87s] threads: 16 tps: 1124.99 qps: 1124.99 (r/w/o: 0.00/1124.99/0.00) latency: 2.00ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  90s] threads: 16 tps: 1125.01 qps: 1125.01 (r/w/o: 0.00/1125.01/0.00) latency: 2.18ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  93s] threads: 16 tps: 1124.99 qps: 1124.99 (r/w/o: 0.00/1124.99/0.00) latency: 2.35ms (95%) errors/s: 0.00 reconnects/s: 0.00
[  96s] threads: 16 tps: 1125.00 qps: 1125.00 (r/w/o: 0.00/1125.00/0.00) latency: 2.11ms (95%) errors/s: 0.00 reconnects/s: 0.00

Note:
Running the follow on the writer node prevents the performance drop:
SET GLOBAL group_replication_flow_control_mode=disabled;

Suggested fix:
A node that is in MEMBER_STATE=RECOVERING should not be evaluate in the flow control algorithm.
[2 Feb 2017 13:17] Umesh Shastry
Hello Rene,

Thank you for the report and feedback!

Thanks,
Umesh
[3 Feb 2017 11:25] Nuno Carvalho
Posted by developer:
 
Hi René,

Thank you for evaluating Group Replication, your (and all community feedback) is important!

Indeed when a member joins the group, if there is a big gap of that for it to recover from the group, the flow control will be triggered in order to allow the member to join. This is done this way by design, that is, request group to slow down while the member is fetching and applying data.

When the missing data from group to new member is big, we recommend to provisioning the new member before join, this will drastically shrink the recovery time.
Also, enabling parallel applier will shrink the recovery time, since missing data can be applied in parallel.

Suggested fix:
A node that is in MEMBER_STATE=RECOVERING should not be evaluate in the flow control algorithm.

Like I said above, this is the behaviour by default, so you are requesting a option to disable that behaviour, that is, specify that new members recovery do not trigger flow control. We will look into it.

Best regards,
Nuno Carvalho
[21 Sep 2017 17:38] Nuno Carvalho
Posted by developer:
 
Hi Rene,

You can limit the impact of a new member with the option group_replication_flow_control_min_recovery_quota from WL#9838: "Group Replication: Flow-control fine tuning".

Best regards,
Nuno Carvalho
[4 Oct 2017 14:14] David Moss
Posted by developer:
 
Thanks for your feedback. The group_replication_flow_control_min_recovery_quota option was mentioned in the change log entry for WL#9838. You can find the documentation here:
https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic...

Closing this bug without further change log entry.