Bug #117050 Mysqlshell version 8.4.0 is throwing LogicError while removing an instance
Submitted: 26 Dec 2024 15:01 Modified: 14 May 14:35
Reporter: Pravata Dash Email Updates:
Status: No Feedback Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:8.4.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: mysqlshell

[26 Dec 2024 15:01] Pravata Dash
Description:
When attempting to remove a secondary instance left in an inconsistent state using the remove_instance() method in MySQL Shell version 8.4.0 (and even 8.4.1/8.4.3) for an InnoDB cluster with group replication, a LogicError occurs. This operation succeeds in MySQL Shell versions 8.0.3/4.

Error Details:
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
Cluster.removeInstance: Unexpected instance_type: READ_REPLICA (LogicError)
 

Also, a notable gap exists between MySQL Shell versions 8.0.3/4 and 8.4.0 regarding the rescan() operation. In versions 8.0.3/4, inconsistent instances are removed during the rescan(), while in version 8.4.0, the rescan() provides normal output without prompting users to remove anything or take any action.

I noticed that LogicError bugs were fixed in previous versions, so I'm unsure why they have reappeared in the new version 8.4.0.
LogicError Bug in mysqlsh 8-0-16: 29304183, 27677227
LogicError Bug in mysqlsh 8-0-19: 30657204

How to repeat:
Step 1:  
Set up a MySQL InnoDB cluster with 3 nodes in group replication (1 primary, 2 secondary) in single primary mode on a Kubernetes environment, with an asynchronous replica replicating from the primary.

Step 2:  
Delete the PVC for the asynchronous replica.

Step 3:  
Delete the POD for the asynchronous replica.

Step 4:  
The asynchronous replica POD will terminate along with the replication loss from the primary.

Step 5:  
Using MySQL Shell (version 8.4.0), perform a cluster status check, followed by a rescan and removal of the instance:  
```
dba.getCluster('innodbcluster-xxx-1').status()  
dba.getCluster('innodbcluster-xxx-1').rescan()  
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
```

Step 6:  
During the rescan, there is no prompt to remove the inconsistent asynchronous replica, and the removeInstance() method raises a LogicError, preventing the removal.

Error:
JS > dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
Cluster.removeInstance: Unexpected instance_type: READ_REPLICA (LogicError)

Suggested fix:
Like MySQL Shell versions 8.0.3/4, version 8.4.0 should prompt the user during the rescan() call to remove the inconsistent async replica, and the removeInstance() call should succeed instead of throwing a LogicError.

Below are success calls for the same in older versions.

For rescan():
```dba.getCluster('innodbcluster-xxx-1').rescan()
Rescanning the cluster...

Result of the rescanning operation for the 'innodbcluster-xxx-1' cluster:
{
    "name": "innodbcluster-xxx-1",
    "newTopologyMode": null,
    "newlyDiscoveredInstances": [],
    "unavailableInstances": [
        {
            "host": "innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306",
            "label": "innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306",
            "member_id": "xxxxxxxxxxxxxxxx"
        }
    ],
    "updatedInstances": []
}

The instance 'innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306' is no longer part of the cluster.
The instance is either offline or left the HA group. You can try to add it to the cluster again with the cluster.rejoinInstance('innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y/n]:
```

For removeInstance():
```
dba.getCluster("innodbcluster-xxx-1").removeInstance("innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306")
ERROR: innodbcluster-xxx-1-asyncrr-mysql-rr-0.analytics-mysql-rr.default.svc.cluster.local:3306 is reachable but has state OFFLINE
To safely remove it from the cluster, it must be brought back ONLINE. If not possible, use the 'force' option to remove it anyway.

Do you want to continue anyway (only the instance metadata will be removed)? [y/N]: y
```
[26 Dec 2024 15:25] Pravata Dash
Used Mysql version: 8.0.35
[10 Jan 6:55] Pravata Dash
Additionally, we have tested the same flow in both mysqlshell and mysql server version 8.4.3, we are facing the same error.
[27 Jan 13:02] MySQL Verification Team
Hi, please use all latest 8.4 (mysql and mysql shell), mysql 8.0.35 is old. I cannot reproduce this with proper versions.

Thanks
[28 Jan 12:19] Pravata Dash
As mentioned above, we have already tested the same flow in both mysqlshell and mysql server version 8.4.3, we are facing the same error. So, using the mentioned 8.4 wont be helpful here.
[14 Apr 14:32] Miguel Araujo
Hi Pravata,

When you say "with an asynchronous replica replicating from the primary," does this mean it's a Read-Replica added to the Cluster using addReplicaInstance()? I assume that’s the case, but I got a bit confused when you mentioned that it worked in 8.0.x, since Read-Replica support was only introduced in 8.1.0.

Could you also share the output of cluster.status({extended:2})?

Additionally, can you provide a dump of the mysql_innodb_cluster_metadata.instances table?
[15 May 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".