Bug #106937 Emergency Failover after Network Split Invalidates Primary in ClusterSet
Submitted: 6 Apr 23:43 Modified: 13 May 21:17
Reporter: Hamza Ahmed Email Updates:
Status: Closed Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:8.0.28 OS:CentOS
Assigned to: CPU Architecture:x86
Tags: InnoDB Cluster, InnoDB ClusterSet, mysqlrouter

[6 Apr 23:43] Hamza Ahmed
Description:
Performing an emergency failover on a ClusterSet during a network split and then attempting to rejoin the failed cluster fails because of additional transactions on `mysql_innodb_cluster_metadata`.`routers` table.

How to repeat:
Testing Emergency Failovers in a lab environment:

1. I created a test ClusterSet environment with 3 nodes [A, B, C] in the primary cluster [cls] and one node [D] in the replica cluster [cls_dr]. 
2. Created a network block by blocking all traffic from A, B, C's traffic on D via iptables. The clusters couldn't communicate with each other. 
3. Performed an emergency failover on the D node with myclusterset.rejoinCluster("cls_dr"). cls_dr became the primary cluster with D as the primary, cls was invalidated.
4. Transactions continued on cls_dr.
5. Undid the network block from step 2 to allow communiaction again.
6. Attempts to rejoin the cluster failed as below:
 MySQL  D:6446 ssl  JS > myclusterset.rejoinCluster("cls")
Rejoining cluster 'cls' to the clusterset
NOTE: Cluster 'cls' is invalidated
A:3306 has the following errant transactions not present at D:3306: 2c4e5499-b52a-11ec-abd0-739388f121d0:517-553
ERROR: Cluster 'cls' cannot be rejoined because it contains transactions that did not originate from the primary of the clusterset.
ClusterSet.rejoinCluster: Errant transactions detected at A:3306 (MYSQLSH 51152)
7. Decoded transaction logs to figure out where the extra transactions came from. They were all on `mysql_innodb_cluster_metadata`.`routers` table.

Suggested fix:
Transactions by the routers, or whatever made those changes shouldn't have been logged to the binary logs; OR those should be skippable when rejoining the ClusterSet.
[15 Apr 10:35] MySQL Verification Team
Hi,

Thanks for the report
[15 Apr 15:47] MySQL Verification Team
Hi,

Thanks for the report
[13 May 21:17] Philip Olson
Posted by developer:
 
Fixed as of the upcoming MySQL Router 8.0.30 release, and here's the proposed changelog entry from the documentation team:

Router no longer updates the v2_routers.last_check_in metadata field with
an active timestamp, when before it updated it every 10th metadata
refresh. Those updates to the metadata schema became errant transactions
when a split-brain occurred, and it became impossible to rejoin the old
primary back to a ClusterSet. Router instances in a partition could not
understand that a failover happened so it continued updating.

Now v2_routers.last_check_in is only updated when Router
starts; so its value now represents the last time Router was launched.
This coincides with lastCheckIn from MySQL Shell AdminAPI's listRouters()
method.

Thank you for the bug report.
[16 May 9:53] Andrzej Religa
Posted by developer:
 
The decision was made to not fix this bug by disabling the
runtime updates to last_check_in field. This will instead be
configurable in the metadata and there will be worklogs for
that in both Router and Shell.