| Bug #108064 | forcePrimaryCluster failing, no override | ||
|---|---|---|---|
| Submitted: | 3 Aug 2022 15:06 | Modified: | 22 Aug 2022 18:23 |
| Reporter: | Jay Janssen | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Shell AdminAPI InnoDB Cluster / ReplicaSet | Severity: | S1 (Critical) |
| Version: | 8.0.30 | OS: | Any |
| Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
[3 Aug 2022 15:22]
Jay Janssen
on hindsight, this is S1. I'm attempting to reproduce
[3 Aug 2022 16:05]
Miguel Araujo
Hi Jay, Can you please reproduce the issue with the logging set to debug level and share the relevant log entries? Either start shell with $ ./bin/mysqlsh --log-level=8 --dba-log-sql=2 or, do the following when shell is already running: shell.options["dba.logSql"]=2 shell.options["logLevel"]=8 Thanks.
[4 Aug 2022 17:44]
Alfredo Kojima
I was able to reproduce by ensuring the applier is still applying a backlog of transactions at the time of the failover. A workaround is waiting for the applier queue to empty before failover.
[22 Aug 2022 18:23]
Edward Gilmore
Posted by developer: Added the following note to the MySQL Shell 8.0.31 release notes: Cluster failover could fail under high load because the cluster being promoted was compared with itself in checks to confirm the promoted cluster was the most up-to-date. This comparison failed because the applier was catching up and the GTID_EXECUTED comparison resulted in two different values. As of this release, the check for most up-to-date cluster does not include the promoted cluster. Thanks to Jay Janssen for reporting this issue.

Description: I have a clusterset where I've simulated a failure of the primary cluster by killing the mysqld processes. MySQL 10.170.1.106:33060+ ssl JS > cs.status() { "clusters": { "jay-test2-east": { "clusterErrors": [ "ERROR: Could not connect to any ONLINE members but there are unreachable instances that could still be ONLINE." ], "clusterRole": "PRIMARY", "clusterSetReplicationStatus": "UNKNOWN", "globalStatus": "UNKNOWN", "primary": null, "status": "UNREACHABLE", "statusText": "Could not connect to any ONLINE members" }, "jay-test2-west": { "clusterErrors": [ "WARNING: Replication from the Primary Cluster not in expected state" ], "clusterRole": "REPLICA", "clusterSetReplicationStatus": "ERROR", "globalStatus": "NOT_OK", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure." } }, "domainName": "jay-test2-global", "globalPrimaryInstance": null, "primaryCluster": "jay-test2-east", "status": "UNAVAILABLE", "statusText": "Primary Cluster is not reachable from the Shell, assuming it to be unavailable." } Now when I try to forcePrimaryCluster to the remaining cluster, I get this error: MySQL 10.170.1.106:33060+ ssl JS > cs.forcePrimaryCluster("jay-test2-west") Failing-over primary cluster of the clusterset to 'jay-test2-west' * Verifying primary cluster status None of the instances of the PRIMARY cluster 'jay-test2-east' could be reached. * Verifying clusterset status ** Checking cluster jay-test2-west Cluster 'jay-test2-west' is available ** Checking whether target cluster has the most recent GTID set NOTE: Cluster jay-test2-west has a more up-to-date GTID set The following GTIDs are missing from the target cluster: ERROR: The selected target cluster is not the most up-to-date cluster available for failover. ClusterSet.forcePrimaryCluster: Target cluster is behind other candidates (MYSQLSH 51311) In essence: you can't force jay-test2-west to primary because jay-test2-west has more transactions. There is no force option with forcePrimaryCluster, so now I am stuck. How to repeat: I haven't confirmed it's repeatable.