Bug #108475 | addInstance exits with "Not connected" after clone completes | ||
---|---|---|---|
Submitted: | 13 Sep 2022 15:12 | Modified: | 23 Feb 2023 19:03 |
Reporter: | Jay Janssen | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | Shell AdminAPI InnoDB Cluster / ReplicaSet | Severity: | S2 (Serious) |
Version: | 8.0.30 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[13 Sep 2022 15:12]
Jay Janssen
[13 Sep 2022 15:57]
Jay Janssen
Checking further, this *might* be a manifestation of "dba.restartWaitTimeout" not being set properly on the shell, so maybe a false alarm for this bug. I was able to reproduce the same basic behavior, though this time the log was slightly different: ``` Clone based state recovery is now in progress. NOTE: A server restart is expected to happen as part of the clone process. If the server does not support the RESTART command or does not come back after a while, you may need to manually start it back. * Waiting for clone to finish... NOTE: 10.162.254.54:3306 is being cloned from 10.162.255.161:3306 ** Stage DROP DATA: Completed ** Stage FILE COPY: Completed ** Stage PAGE COPY: Completed ** Stage REDO COPY: Completed NOTE: 10.162.254.54:3306 is shutting down... * Waiting for server restart... timeout WARNING: Clone process appears to have finished and tried to restart the MySQL server, but it has not yet started back up. Please make sure the MySQL server at '10.162.254.54:3306' is restarted and call <Cluster>.rescan() to complete the process. To increase the timeout, change shell.options["dba.restartWaitTimeout"]. ERROR: Timeout waiting for server to restart ``` Is it possible shell will give either error for the same core issue?
[14 Sep 2022 13:46]
Jay Janssen
update: I corrected my issue with dba.restartWaitTimeout not being properly set, but I'm still seeing the "ERROR: Not Connected" * Waiting for server restart... ready * 10.162.255.177:3306 has restarted, waiting for clone to finish... ** Stage FILE SYNC: Completed ** Stage RESTART: Completed * Clone process has finished: 297.46 GB transferred in 26 min 14 sec (188.98 MB/s) Incremental state recovery is now in progress. * Waiting for distributed recovery to finish... NOTE: '10.162.255.177:3306' is being recovered from '10.162.254.180:3306' * Distributed recovery has finished ERROR: Not connected
[14 Sep 2022 13:57]
Jay Janssen
this cluster is under a decent amount of load when this happens. I am not seeing this issue in a smaller cluster with proportionally less load and smaller clone time.
[14 Sep 2022 14:01]
Jay Janssen
upgrading severity for me as this is disrupting my instance automation. I can't recover from this on the instance itself easily, because I have to rescan the cluster from another instance to get the instance metadata updated.
[21 Sep 2022 14:22]
Jay Janssen
Just had an instance of what I think is the same bug, with a slightly different error message: * Clone process has finished: 61.45 GB transferred in 11 min 22 sec (90.11 MB/s) ESC[1mIncremental state recovery is now in progress.ESC[0m * Waiting for distributed recovery to finish... ESC[36mNOTE: ESC[0m'10.170.254.76:3306' is being recovered from '10.170.255.37:3306' * Distributed recovery has finished ESC[31mERROR: ESC[0mLost connection to MySQL server during query doing a cluster rescan and adding the instance recovered the problem.
[23 Sep 2022 14:45]
Jay Janssen
Another complaint here -- despite add-instance throwing an error message, the command line integration (i.e., `mysqlsh cluster add-instance ...`) returns 0 and not an error code (1 or higher). If any command hits any kind of error/exception, the command line integration (and the equivalent python and js functions) should return proper error codes and/or throw exceptions.
[29 Sep 2022 16:28]
MySQL Verification Team
Hi Jay, Thanks for the report. I'm not reproducing this issue as it looks like it requires some specific load to happen but I'll see what the shell team can do with it.
[23 Feb 2023 19:03]
Miguel Araujo
Posted by developer: Unable to reproduce with MySQL Shell 8.0.32.