MySQL Bugs: #108426: Adding instance to a new replica cluster under load results in errors

Bug #108426	Adding instance to a new replica cluster under load results in errors
Submitted:	8 Sep 2022 12:36	Modified:	6 Dec 2022 11:06
Reporter:	Jay Janssen	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Shell AdminAPI InnoDB Cluster / ReplicaSet	Severity:	S3 (Non-critical)
Version:	8.0.30	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
I have a clusterset where the primary cluster is under reasonably heavy write load and I am trying to add a new replica cluster to the cluster set.

I have 3 nodes for the new cluster. I am running 'create_replica_cluster' on the first node, which works fine:

```
create_opts={
"recoveryMethod": "clone",
"interactive": False,
"timeout": 172800, # wait for the new instance to catch up
}
seed_clusterset.create_replica_cluster(args.standalone , args.name, create_opts)
```

This proceeds normally:

```
Creating InnoDB Cluster 'jaytest-staging-002-usw2' on '10.170.254.87:3306'...

Adding Seed Instance...
Cluster successfully created. Use Cluster.add_instance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.

* Configuring ClusterSet managed replication channel...
** Changing replication source of 10.170.254.87:3306 to 10.162.254.200:3306

* Waiting for instance '10.170.254.87:3306' to synchronize with PRIMARY Cluster...
** Transactions replicated ############################################################ 100%

* Updating topology

Replica Cluster 'jaytest-staging-002-usw2' successfully created on ClusterSet 'jaytest-staging-002'.
```

I then immediately try to add another instance to this new cluster using the normal cluster.add_instance, but I get this:

```
Adding instance to the cluster...

ERROR: Unable to enable clone on the instance '10.170.254.87:3306': MySQL Error 1290 (HY000): 10.170.254.87:3306: The MySQL server is running with the --super-read-only option so it cannot execute this statement

ERROR: Unable to create the Group Replication recovery account: 10.170.254.87:3306: The MySQL server is running with the --super-read-only option so it cannot execute this statement
```

When I try to add the node again later, it seems to work fine. I suspect the issue might have been that the new cluster was behind in replication from the primary cluster. As I stated before, I have write load on the primary cluster and I haven't seen this issue without the load.

As you can see, I added the 'timeout' option to the 'create_replica_cluster' call with the intent that it would not return until the new cluster was caught up in replication, but perhaps that doesn't work like I expect.

How to repeat:
1) Setup cluster 1 under sysbench load
2) setup clusterset
3) Setup new replica cluster with seed instance
4) Setup additional node right as soon as the replica cluster is up.

output log of script setting up new replica clsuter

Attachment: replica-cluster-setup.log (application/octet-stream, text), 6.39 KiB.

Hi,

I think this is not a bug. You had to wait for the first node you added to finish creating before you added a new one (and you could after first one finished) but I will check with GR dev team if there is something else we might report instead of that errror.

thanks for your interest in MySQL

> I think this is not a bug. You had to wait for the first node you added to finish creating before you added a new one (and you could after first one finished)

The create_replica_cluster command completed successfully before I called add_instance, so I feel like I did wait.

I am also setting the 'timeout' flag on create_replica_cluster command, which states from you documentation:

"timeout: maximum number of seconds to wait for the instance to sync up with the PRIMARY Cluster. Default is 0 and it means no timeout."

Given that I am setting that to a high value, isn't it reasonable to expect that after the create_replica_cluster command returns, that the new replica cluster would be synced up with the primary?

Hi,

I discussed this with GR dev team, things should not behave like this. There might be (or was, might be already fixed) a bug in AdminAPI. I'll try to reproduce this and move forward with it.

Thanks

The log you uploaded has some truncated output for cluster.status() at the end, would you happen to still have the full output in your terminal history?

I was able to reproduce by using a fresh cluster handle from dba.getCluster() called after the replica cluster is created and before adding the secondaries.

Posted by developer:
 
Added the following note to the MySQL Shell 8.0.32 release notes:

Attempting to add an instance to a newly-created ReplicaCluster, if the Primary Cluster was under high load,
failed with several errors related to super_read_only.
This issue was caused by an out-of-date topology view, leading to the newly created ReplicaCluster being
considered a standalone cluster.
As of this release, createReplicaCluster() synchronizes the metadata update transactions. thereby ensuring it has
the correct topology view.