Description:
When incomplete InnoDBCluster exsits, the on_startup process of mysql-operator fails, and mysql-operator pod can't reach the Ready state 1/1.
Examples of an incomplete it include missing secret resources related to InnoDBCluster.
The existence of an incomplete it hinders the startup of mysq-operator and affects the normal operation of other it.
How to repeat:
1. Prepare namespaces test-1 and test-2, along with mysql-operator.
2. Deploy mysql-operator using Helm in the mysql-operator namespace.
$ helm install mysql-operator mysql-operator/mysql-operator \
--namespace mysql-operator \
--create-namespace \
--set envs.k8sClusterDomain='cluster.local'
3. Deploy InnoDB clusters to test-1 and test-2 using Helm.
$ helm install test-1 mysql-operator/mysql-innodbcluster \
--namespace test-1 \
--create-namespace \
--set credentials.root.user='root' \
--set credentials.root.password='supersecret' \
--set credentials.root.host='%' \
--set serverInstances=3 \
--set routerInstances=1 \
--set tls.useSelfSigned='true'
4. Delete the test-1-privsecrets secret in the test-1 namespace. (test-1-privsecrets is installed by helm)
$ kubectl delete secret test-1-privsecrets -n test-1
5. Delete the mysql-operator pod.
$ kubectl delete pods --all -n mysql-operator
6. Modify serverInstance count in test-2, expecting an increase, but it does not happen. Also creating new cluster is failure.
Upon deleting the mysql-operator pod in step 5, check the log for a stack trace:
====
[2024-03-12 04:18:56,995] kopf.activities.star [INFO ] DEFAULT_IMAGE_REPOSITORY =container-registry.oracle.com/mysql
[2024-03-12 04:18:57,059] kopf.activities.star [ERROR ] Activity 'on_startup' failed with an exception. Will retry.
Traceback (most recent call last):
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 279, in execute_handler_once
result = await invoke_handler(
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 374, in invoke_handler
result = await invocation.invoke(
File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/invocation.py", line 139, in invoke
await asyncio.shield(future) # slightly expensive: creates tasks
File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/operator.py", line 44, in on_startup
operator_cluster.monitor_existing_clusters(clusters, logger)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/operator_cluster.py", line 45, in monitor_existing_clusters
g_group_monitor.monitor_cluster(
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/group_monitor.py", line 202, in monitor_cluster
account = RetryLoop(logger).call(cluster.get_admin_account)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/shellutils.py", line 93, in call
return f(*args)
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1767, in get_admin_account
secrets = self.get_private_secrets()
File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1669, in get_private_secrets
api_core.read_namespaced_secret(f"{self.name}-privsecrets", self.namespace))
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24803, in read_namespaced_secret
return self.read_namespaced_secret_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24890, in read_namespaced_secret_with_http_info
return self.api_client.call_api(
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 373, in request
return self.rest_client.GET(url,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 240, in GET
return self.request("GET", url,
File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': '3e9c0518-2a23-4c9a-8d9b-22cbe23ce84a', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'cd1dbcc3-3aee-42f9-8851-bba899743652', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9f6d091b-f5df-4ac9-bd1b-d35afa3b0397', 'Date': 'Tue, 12 Mar 2024 04:18:57 GMT', 'Content-Length': '210'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"test-1-privsecrets\" not found","reason":"NotFound","details":{"name":"test-1-privsecrets","kind":"secrets"},"code":404}
====
Suggested fix:
Appropriately handle exceptions in the section checking InnoDBClusters during on_startup, ensuring that on_startup doesn't fail if one check fails.
https://github.com/mysql/mysql-operator/blob/4a80d27486c36a1ba1262d79cddbb99be21e52ba/mysq...
Description: When incomplete InnoDBCluster exsits, the on_startup process of mysql-operator fails, and mysql-operator pod can't reach the Ready state 1/1. Examples of an incomplete it include missing secret resources related to InnoDBCluster. The existence of an incomplete it hinders the startup of mysq-operator and affects the normal operation of other it. How to repeat: 1. Prepare namespaces test-1 and test-2, along with mysql-operator. 2. Deploy mysql-operator using Helm in the mysql-operator namespace. $ helm install mysql-operator mysql-operator/mysql-operator \ --namespace mysql-operator \ --create-namespace \ --set envs.k8sClusterDomain='cluster.local' 3. Deploy InnoDB clusters to test-1 and test-2 using Helm. $ helm install test-1 mysql-operator/mysql-innodbcluster \ --namespace test-1 \ --create-namespace \ --set credentials.root.user='root' \ --set credentials.root.password='supersecret' \ --set credentials.root.host='%' \ --set serverInstances=3 \ --set routerInstances=1 \ --set tls.useSelfSigned='true' 4. Delete the test-1-privsecrets secret in the test-1 namespace. (test-1-privsecrets is installed by helm) $ kubectl delete secret test-1-privsecrets -n test-1 5. Delete the mysql-operator pod. $ kubectl delete pods --all -n mysql-operator 6. Modify serverInstance count in test-2, expecting an increase, but it does not happen. Also creating new cluster is failure. Upon deleting the mysql-operator pod in step 5, check the log for a stack trace: ==== [2024-03-12 04:18:56,995] kopf.activities.star [INFO ] DEFAULT_IMAGE_REPOSITORY =container-registry.oracle.com/mysql [2024-03-12 04:18:57,059] kopf.activities.star [ERROR ] Activity 'on_startup' failed with an exception. Will retry. Traceback (most recent call last): File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 279, in execute_handler_once result = await invoke_handler( File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/execution.py", line 374, in invoke_handler result = await invocation.invoke( File "/usr/lib/mysqlsh/python-packages/kopf/_core/actions/invocation.py", line 139, in invoke await asyncio.shield(future) # slightly expensive: creates tasks File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/operator.py", line 44, in on_startup operator_cluster.monitor_existing_clusters(clusters, logger) File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/operator_cluster.py", line 45, in monitor_existing_clusters g_group_monitor.monitor_cluster( File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/group_monitor.py", line 202, in monitor_cluster account = RetryLoop(logger).call(cluster.get_admin_account) File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/shellutils.py", line 93, in call return f(*args) File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1767, in get_admin_account secrets = self.get_private_secrets() File "/usr/lib/mysqlsh/python-packages/mysqloperator/controller/innodbcluster/cluster_api.py", line 1669, in get_private_secrets api_core.read_namespaced_secret(f"{self.name}-privsecrets", self.namespace)) File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24803, in read_namespaced_secret return self.read_namespaced_secret_with_http_info(name, namespace, **kwargs) # noqa: E501 File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api/core_v1_api.py", line 24890, in read_namespaced_secret_with_http_info return self.api_client.call_api( File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/usr/lib/mysqlsh/python-packages/kubernetes/client/api_client.py", line 373, in request return self.rest_client.GET(url, File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 240, in GET return self.request("GET", url, File "/usr/lib/mysqlsh/python-packages/kubernetes/client/rest.py", line 234, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (404) Reason: Not Found HTTP response headers: HTTPHeaderDict({'Audit-Id': '3e9c0518-2a23-4c9a-8d9b-22cbe23ce84a', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'cd1dbcc3-3aee-42f9-8851-bba899743652', 'X-Kubernetes-Pf-Prioritylevel-Uid': '9f6d091b-f5df-4ac9-bd1b-d35afa3b0397', 'Date': 'Tue, 12 Mar 2024 04:18:57 GMT', 'Content-Length': '210'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"test-1-privsecrets\" not found","reason":"NotFound","details":{"name":"test-1-privsecrets","kind":"secrets"},"code":404} ==== Suggested fix: Appropriately handle exceptions in the section checking InnoDBClusters during on_startup, ensuring that on_startup doesn't fail if one check fails. https://github.com/mysql/mysql-operator/blob/4a80d27486c36a1ba1262d79cddbb99be21e52ba/mysq...