| Bug #108065 | clusterset.rejoinCluster() hangs | ||
|---|---|---|---|
| Submitted: | 3 Aug 2022 15:46 | Modified: | 2 Sep 2022 16:49 |
| Reporter: | Jay Janssen | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Shell AdminAPI InnoDB Cluster / ReplicaSet | Severity: | S3 (Non-critical) |
| Version: | 8.0.30 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[3 Aug 2022 18:35]
Jay Janssen
encountered a similar issue just trying to do a clusterset switchover:
MySQL 10.162.0.219:33060+ ssl JS > cs.status()
{
"clusters": {
"jay-test2-east": {
"clusterRole": "PRIMARY",
"globalStatus": "OK",
"primary": "10.162.0.219:3306"
},
"jay-test2-west": {
"clusterRole": "REPLICA",
"clusterSetReplicationStatus": "OK",
"globalStatus": "OK"
}
},
"domainName": "jay-test2",
"globalPrimaryInstance": "10.162.0.219:3306",
"primaryCluster": "jay-test2-east",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
MySQL 10.162.0.219:33060+ ssl JS > cs.setPrimaryCluster("jay-test2-west")
Switching the primary cluster of the clusterset to 'jay-test2-west'
* Verifying clusterset status
** Checking cluster jay-test2-west
Cluster 'jay-test2-west' is available
** Checking cluster jay-test2-east
Cluster 'jay-test2-east' is available
* Reconciling internally generated GTIDs
[2 Sep 2022 16:49]
Edward Gilmore
Posted by developer:
Added the following note to the MySQL Shell 8.0.31 release notes:
ClusterSet commands which perform transaction set consistency
checking, such as rejoinCluster and
setPrimaryCluster, became unresponsive during
view change log event reconciliations if the write load was
high. This occurred because reconciliation included the entire
transaction backlog instead of just the view change log events.

Description: didn't have this issue with 8.0.29. If I have a clusterset and kill -9 all the mysqld nodes on the primary cluster. I then forcePrimaryCluster failover (and it works, sometimes it hits 108064). I then restart the killed nodes and execute dba.rebootClusterFromCompleteOutage(), which works. I then get a clusterset object from my new primary cluster and attempt to rejoin the recovered cluster, it hangs on "* Reconciling internally generated GTIDs" I've waited 5-10 mins so far before Ctrl-Cing, maybe I'm being impatient, but I don't see anything happening in the server logs. How to repeat: This seems repeatable. I did this a lot on 8.0.29 and didn't have the issue. * 2x3 node clusters in a cluster set. * Sysbench load on the primary cluster via router 1. kill -9 `pidof mysqld` on every node in the primary cluster (1) 2. forcePrimaryCluster failover to the other side (2) 3. Restart nodes in cluster 1, issue dba.rebootClusterFromCompleteOutage() successfully 4. Using clusterset handle from cluster 2, try to rejoin the cluster MySQL 10.162.0.219:33060+ ssl JS > dba.rebootClusterFromCompleteOutage() NOTE: Instance 10.170.1.106:3306 has more recent metadata than 10.162.0.219:3306 (generation 2 vs 1), which suggests jay-test2-east has been invalidated NOTE: Cluster jay-test2-east appears to have been invalidated, reconnecting to 10.170.1.106:3306. Restoring the cluster 'jay-test2-east' from complete outage... The instance '10.162.0.229:3306' was part of the cluster configuration but the Cluster is invalidated. Please rejoin the instance after the Cluster is rejoined to the ClusterSet The instance '10.162.0.248:3306' was part of the cluster configuration but the Cluster is invalidated. Please rejoin the instance after the Cluster is rejoined to the ClusterSet Validating instance configuration at 10.162.0.219:3306... This instance reports its own address as 10.162.0.219:3306 Instance configuration is suitable. * Waiting for seed instance to become ONLINE... 10.162.0.219:3306 was restored. NOTE: Instance 10.170.1.106:3306 has more recent metadata than 10.162.0.219:3306 (generation 2 vs 1), which suggests jay-test2-east has been invalidated NOTE: Cluster jay-test2-east appears to have been invalidated, reconnecting to 10.170.1.106:3306. The cluster was successfully rebooted. <Cluster:jay-test2-east> MySQL 10.162.0.219:33060+ ssl JS > cs.status() { "clusters": { "jay-test2-east": { "clusterErrors": [ "WARNING: Replication channel from the Primary Cluster is missing", "WARNING: Cluster was invalidated and must be either removed from the ClusterSet or rejoined" ], "clusterRole": "REPLICA", "clusterSetReplication": {}, "clusterSetReplicationStatus": "MISSING", "globalStatus": "INVALIDATED", "status": "INVALIDATED", "statusText": "Cluster was invalidated by the ClusterSet it belongs to." }, "jay-test2-west": { "clusterRole": "PRIMARY", "globalStatus": "OK", "primary": "10.170.1.106:3306" } }, "domainName": "jay-test2-global", "globalPrimaryInstance": "10.170.1.106:3306", "primaryCluster": "jay-test2-west", "status": "AVAILABLE", "statusText": "Primary Cluster available, there are issues with a Replica cluster." } MySQL 10.162.0.219:33060+ ssl JS > cs.rejoinCluster("jay-test2-east") Rejoining cluster 'jay-test2-east' to the clusterset NOTE: Cluster 'jay-test2-east' is invalidated * Reconciling internally generated GTIDs ^^ Hangs here