Description:
When incomplete InnoDBCluster exsits, the on_startup process of mysql-operator fails, and mysql-operator pod can't reach the Ready state 1/1.
Examples of an incomplete it include missing secret resources related to InnoDBCluster.
The existence of an incomplete it hinders the startup of mysq-operator and affects the normal operation of other it.
How to repeat:
1. Prepare namespaces test-1 and test-2, along with mysql-operator.
2. Deploy mysql-operator using Helm in the mysql-operator namespace.
$ helm install mysql-operator mysql-operator/mysql-operator \
--namespace mysql-operator \
--create-namespace \
--set envs.k8sClusterDomain='cluster.local'
3. Deploy InnoDB clusters to test-1 and test-2 using Helm.
$ helm install test-1 mysql-operator/mysql-innodbcluster \
--namespace test-1 \
--create-namespace \
--set credentials.root.user='root' \
--set credentials.root.password='supersecret' \
--set credentials.root.host='%' \
--set serverInstances=3 \
--set routerInstances=1 \
--set tls.useSelfSigned='true'
4. Delete the test-1-privsecrets secret in the test-1 namespace. (test-1-privsecrets is installed by helm)
$ kubectl delete secret test-1-privsecrets -n test-1
5. Delete the mysql-operator pod.
$ kubectl delete pods --all -n mysql-operator
6. Modify serverInstance count in test-2, expecting an increase, but it does not happen. Also creating new cluster is failure.
Upon deleting the mysql-operator pod in step 5, check the log for a stack trace:
====
[2024-03-12 04:18:56,995] kopf.activities.star [INFO ] DEFAULT_IMAGE_REPOSITORY =container-registry.oracle.com/mysql
[2024-03-12 04:18:57,059] kopf.activities.star [ERROR ] Activity 'on_startup' failed with an exception. Will retry.
Traceback (most recent call last):
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 279, in execute_handler_once
result = await invoke_handler(
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 374, in invoke_handler
result = await invocation.invoke(
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/invocation.py", line 139, in invoke
await asyncio.shield(future) # slightly expensive: creates tasks
File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/operator.py", line 44, in on_startup
operator_cluster.monitor_existing_clusters(clusters, logger)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/operator_cluster.py", line 45, in monitor_existing_clusters
g_group_monitor.monitor_cluster(
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/group_monitor.py", line 202, in monitor_cluster
account = RetryLoop(logger).call(cluster.get_admin_account)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/shellutils.py", line 93, in call
return f(*args)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1767, in get_admin_account
secrets = self.get_private_secrets()
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1669, in get_private_secrets
api_core.read_namespaced_secret(f"{self.name}-privsecrets", self.namespace))
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24803, in read_namespaced_secret
return self.read_namespaced_secret_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24890, in read_namespaced_secret_with_http_info
return self.api_client.call_api(
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 240, in GET
return self.request("GET", url,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': '3e9c0518-2a23-4c9a-8d9b-22cbe23ce84a', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'cd1dbcc3-3aee-42f9-8851-bba899743652', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9f6d091b-f5df-4ac9-bd1b-d35afa3b0397', 'Date': 'Tue, 12 Mar 2024 04:18:57 GMT', 'Content-Length': '210'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"test-1-privsecrets\" not found","reason":"NotFound","details":{"name":"test-1-privsecrets","kind":"secrets"},"code":404}
====
Suggested fix:
Appropriately handle exceptions in the section checking InnoDBClusters during on_startup, ensuring that on_startup doesn't fail if one check fails.
https://github.com/mysql/mysql-operator/blob/4a80d27486c36a1ba1262d79cddbb99be21e52ba/mysq...