Bug #108118 group replication reconfiguration should not require destroying the group
Submitted: 11 Aug 2022 12:22 Modified: 11 Aug 2022 22:22
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S4 (Feature request)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any
Tags: group_replication_communication_stack, group_replication_paxos_single_leader, group_replication_view_change_uuid, windmill

[11 Aug 2022 12:22] Simon Mudd
Description:
We may want to modify currently running GR clusters to use new settings here:

- group_replication_paxos_single_leader
- group_replication_view_change_uuid
- group_replication_communication_stack

e.g. https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic... available from 8.0.27

"Operating with a single consensus leader improves performance and resilience in single-primary mode, particularly when some of the group’s secondary members are currently unreachable."

yet

"This system variable is a group-wide configuration setting. It must have the same value on all group members, cannot be changed while Group Replication is running, and requires a full reboot of the group (a bootstrap by a server with group_replication_bootstrap_group=ON) in order for the value change to take effect. For instructions to safely bootstrap a group where transactions have been executed and certified, see Section 18.5.2, “Restarting a Group”."

So we have to destroy the group in order to change the configuration. In practice that may mean going down from 3 running servers (minimum) in the group to a single server, changing the configuration and then adding back the other group members.

The whole point of having a GR cluster is to survive a server failure, not lose data and ensure there is no chance of a split-brain scenario which is possible in traditional replication environments.

So to change the configuration which improves stability we effectively have to downgrade the cluster to a single "master" during which all the things we want to gain from the use of GR are lost.  Thus the chances of problems during this time window exist.

Simon Mudd
  14:06
hi. sorry for not coming back to you earlier.
14:07
I’ll open a public FR for GR to allow these settings to be agreed to change (however they do that) so the cluster does not have to be destroyed.

How to repeat:
See description above.

Suggested fix:
Please ensure that group replication configuration changes, for settings such as the ones mentioned above, can be agreed by the cluster in such a way so that it IS possible to agree on a change using the setting mentioned above while ensuring that the GR cluster keeps running.

When considering any new GR features that might require any cluster configuration change and agreement ensure that it's also possible to make a change dynamically with all cluster members agreeing on the required change.

This reminds me of static global variable settings in MySQL which require a restart of the server for the change to be used.  There are still a number of requests of mine relating to such variables.

On production systems it MUST BE possible to reconfigure the running GR cluster without stopping the cluster in a straight-forward way, and ideally with as short a pause as possible while the reconfiguration is agreed by the members and then made.

If I find other variables which may be similar I will add them to this FR.
[11 Aug 2022 22:22] MySQL Verification Team
Hi Simon,

Thank you for the report.

kind regards