Bug #108426 Adding instance to a new replica cluster under load results in errors
Submitted: 8 Sep 2022 12:36 Modified: 6 Dec 2022 11:06
Reporter: Jay Janssen Email Updates:
Status: Closed Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S3 (Non-critical)
Version:8.0.30 OS:Any
Assigned to: CPU Architecture:Any

[8 Sep 2022 12:36] Jay Janssen
Description:
I have a clusterset where the primary cluster is under reasonably heavy write load and I am trying to add a new replica cluster to the cluster set.

I have 3 nodes for the new cluster.  I am running 'create_replica_cluster' on the first node, which works fine:

```
create_opts={
    "recoveryMethod": "clone",
    "interactive": False,
    "timeout": 172800, # wait for the new instance to catch up
}
seed_clusterset.create_replica_cluster(args.standalone , args.name, create_opts)
```

This proceeds normally:

```
Creating InnoDB Cluster 'jaytest-staging-002-usw2' on '10.170.254.87:3306'...

Adding Seed Instance...
Cluster successfully created. Use Cluster.add_instance() to add MySQL instances.
At least 3 instances are needed for the cluster to be able to withstand up to
one server failure.

* Configuring ClusterSet managed replication channel...
** Changing replication source of 10.170.254.87:3306 to 10.162.254.200:3306

* Waiting for instance '10.170.254.87:3306' to synchronize with PRIMARY Cluster...
** Transactions replicated  ############################################################  100%

* Updating topology

Replica Cluster 'jaytest-staging-002-usw2' successfully created on ClusterSet 'jaytest-staging-002'.
```

I then immediately try to add another instance to this new cluster using the normal cluster.add_instance, but I get this:

```
Adding instance to the cluster...

ERROR: Unable to enable clone on the instance '10.170.254.87:3306': MySQL Error 1290 (HY000): 10.170.254.87:3306: The MySQL server is running with the --super-read-only option so it cannot execute this statement

ERROR: Unable to create the Group Replication recovery account: 10.170.254.87:3306: The MySQL server is running with the --super-read-only option so it cannot execute this statement
```

When I try to add the node again later, it seems to work fine.  I suspect the issue might have been that the new cluster was behind in replication from the primary cluster.  As I stated before, I have write load on the primary cluster and I haven't seen this issue without the load.  

As you can see, I added the 'timeout' option to the 'create_replica_cluster' call with the intent that it would not return until the new cluster was caught up in replication, but perhaps that doesn't work like I expect.  

How to repeat:
1) Setup cluster 1 under sysbench load
2) setup clusterset
3) Setup new replica cluster with seed instance
4) Setup additional node right as soon as the replica cluster is up.
[8 Sep 2022 12:37] Jay Janssen
output log of script setting up new replica clsuter

Attachment: replica-cluster-setup.log (application/octet-stream, text), 6.39 KiB.

[9 Sep 2022 10:30] MySQL Verification Team
Hi,

I think this is not a bug. You had to wait for the first node you added to finish creating before you added a new one (and you could after first one finished) but I will check with GR dev team if there is something else we might report instead of that errror.

thanks for your interest in MySQL
[9 Sep 2022 11:15] Jay Janssen
> I think this is not a bug. You had to wait for the first node you added to finish creating before you added a new one (and you could after first one finished)

The create_replica_cluster command completed successfully before I called add_instance, so I feel like I did wait.

I am also setting the 'timeout' flag on create_replica_cluster command, which states from you documentation:

"timeout: maximum number of seconds to wait for the instance to sync up with the PRIMARY Cluster. Default is 0 and it means no timeout."

Given that I am setting that to a high value, isn't it reasonable to expect that after the create_replica_cluster command returns, that the new replica cluster would be synced up with the primary?
[9 Sep 2022 11:28] MySQL Verification Team
Hi,

I discussed this with GR dev team, things should not behave like this. There might be (or was, might be already fixed) a bug in AdminAPI. I'll try to reproduce this and move forward with it.

Thanks
[9 Sep 2022 15:27] Alfredo Kojima
The log you uploaded has some truncated output for cluster.status() at the end, would you happen to still have the full output in your terminal history?
[9 Sep 2022 18:34] Alfredo Kojima
I was able to reproduce by using a fresh cluster handle from dba.getCluster() called after the replica cluster is created and before adding the secondaries.
[6 Dec 2022 11:06] Edward Gilmore
Posted by developer:
 
Added the following note to the MySQL Shell 8.0.32 release notes:

Attempting to add an instance to a newly-created ReplicaCluster, if the Primary Cluster was under high load,
failed with several errors related to super_read_only.
This issue was caused by an out-of-date topology view, leading to the newly created ReplicaCluster being
considered a standalone cluster.
As of this release, createReplicaCluster() synchronizes the metadata update transactions. thereby ensuring it has
the correct topology view.