Bug #117050 Mysqlshell version 8.4.0 is throwing LogicError while removing an instance
Submitted: 26 Dec 2024 15:01 Modified: 5 Jan 17:05
Reporter: Pravata Dash Email Updates:
Status: Open Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:8.4.0 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: mysqlshell

[26 Dec 2024 15:01] Pravata Dash
Description:
When attempting to remove a secondary instance left in an inconsistent state using the remove_instance() method in MySQL Shell version 8.4.0 (and even 8.4.1/8.4.3) for an InnoDB cluster with group replication, a LogicError occurs. This operation succeeds in MySQL Shell versions 8.0.3/4.

Error Details:
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
Cluster.removeInstance: Unexpected instance_type: READ_REPLICA (LogicError)
 

Also, a notable gap exists between MySQL Shell versions 8.0.3/4 and 8.4.0 regarding the rescan() operation. In versions 8.0.3/4, inconsistent instances are removed during the rescan(), while in version 8.4.0, the rescan() provides normal output without prompting users to remove anything or take any action.

I noticed that LogicError bugs were fixed in previous versions, so I'm unsure why they have reappeared in the new version 8.4.0.
LogicError Bug in mysqlsh 8-0-16: 29304183, 27677227
LogicError Bug in mysqlsh 8-0-19: 30657204

How to repeat:
Step 1:  
Set up a MySQL InnoDB cluster with 3 nodes in group replication (1 primary, 2 secondary) in single primary mode on a Kubernetes environment, with an asynchronous replica replicating from the primary.

Step 2:  
Delete the PVC for the asynchronous replica.

Step 3:  
Delete the POD for the asynchronous replica.

Step 4:  
The asynchronous replica POD will terminate along with the replication loss from the primary.

Step 5:  
Using MySQL Shell (version 8.4.0), perform a cluster status check, followed by a rescan and removal of the instance:  
```
dba.getCluster('innodbcluster-xxx-1').status()  
dba.getCluster('innodbcluster-xxx-1').rescan()  
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
```

Step 6:  
During the rescan, there is no prompt to remove the inconsistent asynchronous replica, and the removeInstance() method raises a LogicError, preventing the removal.

Error:
JS > dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
Cluster.removeInstance: Unexpected instance_type: READ_REPLICA (LogicError)

Suggested fix:
Like MySQL Shell versions 8.0.3/4, version 8.4.0 should prompt the user during the rescan() call to remove the inconsistent async replica, and the removeInstance() call should succeed instead of throwing a LogicError.

Below are success calls for the same in older versions.

For rescan():
```dba.getCluster('innodbcluster-xxx-1').rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'innodbcluster-xxx-1' cluster:
{
    "name": "innodbcluster-xxx-1",
    "newTopologyMode": null,
    "newlyDiscoveredInstances": [],
    "unavailableInstances": [
        {
            "host": "innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306",
            "label": "innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306",
            "member_id": "xxxxxxxxxxxxxxxx"
        }
    ],
    "updatedInstances": []
}

The instance 'innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306' is no longer part of the cluster.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]:
```

For removeInstance():
```
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
ERROR: innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306 is reachable but has state OFFLINE
To safely remove it from the cluster, it must be brought back ONLINE. If not possible, use the 'force' option to remove it anyway.

Do you want to continue anyway (only the instance metadata will be removed)? [y/N]: y
```
[26 Dec 2024 15:25] Pravata Dash
Used Mysql version: 8.0.35