Bug #112888 View_change_log_event's in secondary cluster break cross-DC failovers
Submitted: 30 Oct 2023 21:35 Modified: 31 Oct 2023 1:42
Reporter: Marcos Albe (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.27 OS:Any
Assigned to: CPU Architecture:Any
Tags: ClusterSet, failover, InnoDB Cluster

[30 Oct 2023 21:35] Marcos Albe
Description:
In 8.0.26 group_replication_view_change_uuid was introduced, which corrected the issue that was described in bug #103641. 

But still there is another problem: these events are also generated on the standby/secondary cluster in a ClusterSet thus creating errant transactions, and if binlogs containing these events are purged, then it will not be possible to perform a failover between clusters.

We see this is documented:
> group_replication_view_change_uuid specifies an alternative UUID to use as the UUID part of the identifier in the GTIDs for view change events generated by the group. The alternative UUID makes these internally generated transactions easy to distinguish from transactions received by the group from clients. This can be useful if your setup allows for failover between groups, and you need to identify and discard transactions that were specific to the backup group.

But it looks rather silly that administrators have to manually inject empty transactions to successfully switch roles, when this could be perfectly automated by the server by just looking at the errant transactions and discarding the ones with UUID matching group_replication_view_change_uuid.

How to repeat:
- Setup ClusterSet and make sure group_replication_view_change_uuid is set to a valid UUID

- On the standby cluster, have nodes leave and re-join the group so View_change_log_event's are generated.

- Purge all binary logs on standby cluster

- Attempt failover via dba.getClusterSet().setPrimaryCluster('gr_dc2');

Suggested fix:
Have the switchover process automatically inject empty transactions on the demoted cluster, to match any errant transaction whose UUID matches group_replication_view_change_uuid on the promoted cluster.
[30 Oct 2023 21:55] Marcos Albe
corrected version; Will double check in latest.
[31 Oct 2023 1:42] Marcos Albe
Closing; Sorry for the hassle!