Bug #99830 Improve the performance robustness for MGR
Submitted: 10 Jun 2020 4:33 Modified: 1 Aug 2020 12:53
Reporter: Bin Wang (OCA) Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S5 (Performance)
Version:8.0.18 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:x86
Tags: jitter, mgr, performance degration

[10 Jun 2020 4:33] Bin Wang
Description:
When running sysbench to test the MGR stability, the throughput often drops drastically.

How to repeat:
Refer to Bug #99133 or Bug #84774

Suggested fix:
We analyzed the source code of MGR and found that the problem is caused by the following code:

plugin/group_replication/src/pipeline_stats.cc:

934 int32 Flow_control_module::do_wait() {
935   DBUG_TRACE;
936   int64 quota_size = m_quota_size.load();
937   int64 quota_used = ++m_quota_used;
938 
939   if (quota_used > quota_size && quota_size != 0) {
940     struct timespec delay;
941     set_timespec(&delay, 1);
942 
943     mysql_mutex_lock(&m_flow_control_lock);
944     mysql_cond_timedwait(&m_flow_control_cond, &m_flow_control_lock, &delay);
945     mysql_mutex_unlock(&m_flow_control_lock);
946   }
947 
948   return 0;
949 }

plugin/group_replication/src/certifier.cc:
 160     struct timespec abstime;
 161     set_timespec(&abstime, 1);
 162     mysql_cond_timedwait(&broadcast_dispatcher_cond, &broadcast_dispatcher_lock,
 163                          &abstime);
 164     mysql_mutex_unlock(&broadcast_dispatcher_lock);

The default time unit in mysql_cond_timedwait (line 944, line 162) is too large for MGR because even one second is not appropriate for high load.

If we change one second to 200 milliseconds( replacing "set_timespec(&delay, 1);" with "set_timespec_nsec(&delay, 1 * 200000000ULL);",no more performance jitter could be found in my test. 

We also found that the throughput is 60% more than before.

If MySQL wants to solve the performance robustness problems for MGR, more fine-grained timewait value should be considerred.
[11 Jun 2020 11:44] MySQL Verification Team
Hi,

Thanks for the analysis. I'm marking this as duplicate of 99133. The additional data you provided will surely be of use to our dev team.

all best
Bogdan
[11 Jun 2020 11:45] MySQL Verification Team
Duplicate of Bug #99133
[12 Jun 2020 2:33] Bin Wang
Solve performance jitter for MGR

Attachment: mgr.patch (application/octet-stream, text), 1.59 KiB.

[16 Jun 2020 14:36] Nuno Carvalho
Hi Bin,

Thank you for your input.

What is the value of group_replication_flow_control_applier_threshold and group_replication_flow_control_certifier_threshold on your servers?

Most likely you need to increase them to adjust them to your workload.
https://dev.mysql.com/doc/refman/8.0/en/group-replication-flow-control.html
https://mysqlhighavailability.com/zooming-in-on-group-replication-performance/

Can you please also describe your workload?
The two bugs your refer are very distinct scenarios.

Best regards,
Nuno Carvalho
[2 Aug 2020 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".