| Bug #90793 | Recovering GR cluster from complete outage fails when SSL is disabled | ||
|---|---|---|---|
| Submitted: | 8 May 2018 13:05 | Modified: | 28 Jun 2018 14:39 |
| Reporter: | Andrew Pryde | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | Shell AdminAPI InnoDB Cluster / ReplicaSet | Severity: | S2 (Serious) |
| Version: | 8.0.11 | OS: | Any (Offical Docker image) |
| Assigned to: | CPU Architecture: | Any | |
| Tags: | mysqlsh | ||
[8 May 2018 13:20]
Andrew Pryde
Update title
[8 May 2018 14:22]
Miguel Araujo
Posted by developer:
Issue
-----
The issue is caused because of dba.rebootClusterFromCompleteOutage() not honouring the *PERSISTED* values of the Group Replication SSL settings and not passing it to the internal addInstance() done for the seed instance.
Fix
---
The fix is to read the persisted value of group_replication_recovery_use_ssl and group_replication_ssl_mode on the target instance and use them on the addInstance() internal call.
Workaround
----------
1) Connect to the seed instance (on this example 'localhost:3330') and execute the following:
RESET PERSIST group_replication_bootstrap_group;
SET GLOBAL group_replication_bootstrap_group=ON;
start group_replication;
... The cluster should now be back online. Double-checking:
MySQL localhost:3330 SQL \js
Switching to JavaScript mode...
MySQL localhost:3330 JS var c = dba.getCluster()
MySQL localhost:3330 JS c.status()
{
"clusterName": "fooBar",
"defaultReplicaSet": {
"name": "default",
"primary": "localhost:3330",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"topology": {
"localhost:3310": {
"address": "localhost:3310",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"localhost:3320": {
"address": "localhost:3320",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
"localhost:3330": {
"address": "localhost:3330",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://root@localhost:3330"
}
2) Rejoin the remaining instances
MySQL localhost:3330 JS c.rejoinInstance("root@localhost:3310")
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Please provide the password for 'root@localhost:3310': ***
Rejoining instance to the cluster ...
The instance 'localhost:3310' was successfully rejoined on the cluster.
MySQL localhost:3330 JS c.rejoinInstance("root@localhost:3320")
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Please provide the password for 'root@localhost:3320': ***
Rejoining instance to the cluster ...
The instance 'localhost:3320' was successfully rejoined on the cluster.
MySQL localhost:3330 JS c.status()
{
"clusterName": "fooBar",
"defaultReplicaSet": {
"name": "default",
"primary": "localhost:3330",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"localhost:3310": {
"address": "localhost:3310",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"localhost:3320": {
"address": "localhost:3320",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"localhost:3330": {
"address": "localhost:3330",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
},
"groupInformationSourceMember": "mysql://root@localhost:3330"
}
[28 Jun 2018 14:39]
David Moss
Posted by developer: Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 8.0.12 changelog: In the event of a whole cluster stopping unexpectedly, upon reboot the memberSslMode was not preserved. In a cluster where SSL had been disabled, upon issuing dba.rebootClusterFromCompleteOutage() this could prevent instances from rejoining the cluster.

Description: Calling dba.reboot_cluster_from_complete_outage("MySQLCluster") in mysql-shell when a cluster has been created with dba.create_cluster("MySQLCluster", {"sslMemberMode": "DISABLED"}) fails to recover the cluster as mysql-server restarts with SSL enabled. How to repeat: 1. dba.create_cluster("MySQLCluster", {"sslMemberMode": "DISABLED"}) 2. Terminate all instances in cluster 3. dba.reboot_cluster_from_complete_outage("MySQLCluster") 4. Note cluster does not come back online due to SSL being re-enabled.