Bug #110486 MySQL operator not able to connect to k8s API server
Submitted: 24 Mar 2023 5:25 Modified: 4 Apr 2023 0:51
Reporter: Hemant Kumar Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Operator Severity:S1 (Critical)
Version:mysql/mysql-operator:8.0.32-2.0.8 OS:Linux
Assigned to: CPU Architecture:Any

[24 Mar 2023 5:25] Hemant Kumar
Description:
Hi,
While deploying MySQL operator on k8s we are facing the below issue.

[2023-03-24 04:43:56,384] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5399757025357688748), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))

And the operator gets into the deadlock situation. 

How to repeat:
Deploy the MySQL operator using helm chart. Wait for sometime and the k8s API connection error will get printed out.
[29 Mar 2023 2:40] MySQL Verification Team
Hi,

I did not manage to reproduce this using my test setup. Can you share exact steps (and helm file) you used to get this problem

Thanks
[30 Mar 2023 13:38] Hemant Kumar
I have used this chart to deploy operator + MySQL cluster - https://github.com/mysql/mysql-operator/tree/trunk/helm
And below are the values:

credentials:
  root:
    user: root
    password: xxxxxxxxxx
    host: "%"
tls:
  useSelfSigned: true
serverConfig:
  mycnf: |
    [mysqld]
    max_allowed_packet=256M
datadirVolumeClaimTemplate:
  storageClassName: ssd-retain
  accessModes: ReadWriteOnce
  resources:
    requests:
      storage: 200Gi
[30 Mar 2023 13:51] Hemant Kumar
Some more logs form operator
[2023-03-30 00:53:18,294] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 01:02:18,138] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 01:03:19,202] kopf._core.engines.p [ERROR   ] Request attempt #2/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 03:02:53,818] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 03:35:30,134] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 04:01:56,585] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 07:29:19,547] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
[2023-03-30 07:55:45,947] kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=-5250298061144997007), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))
2023-03-30 08:07:04: Info: Session.run_sql: tid=3894937: MySQL Error 2006: MySQL server has gone away, SQL: select 1
GroupMonitor: Error fetching notice: dest=mysql-1.mysql-instances.qa.svc.cluster.local:33060 error=MySQL Error (2006): Session.run_sql: MySQL server has gone away
GroupMonitor: Trying to connect to a member of cluster qa/mysql
2023-03-30 08:07:04: Info: mysqlx.get_session: tid=4085390: CONNECTED: mysql-1.mysql-instances.qa.svc.cluster.local:33060
GroupMonitor: Connect member of qa/mysql OK <Session:mysqladmin@mysql-1.mysql-instances.qa.svc.cluster.local:33060>
[30 Mar 2023 13:52] Hemant Kumar
Our infrastructure is on Azure (AKS)
[30 Mar 2023 13:53] Hemant Kumar
Once I restart the pod it works fine again
Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
2023-03-30 13:49:49: Info: mysqlsh   Ver 8.0.32 for Linux on x86_64 - for MySQL 8.0.32 (MySQL Community Server (GPL)) - build 9612040 - commit_id 92dff1789f3b984a031f695328afb91ffc160300
2023-03-30 13:49:49: Info: Using credential store helper: /usr/bin/mysql-secret-store-login-path
2023-03-30 13:49:49: Info: Loading startup files...
2023-03-30 13:49:49: Info: Loading plugins...
[2023-03-30 13:49:49,987] kopf.activities.star [INFO    ] MySQL Operator/operator.py=2.0.8 timestamp=2023-01-17T06:10:27 kopf=1.35.4 uid=2
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] KUBERNETES_VERSION =1.25
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] OPERATOR_VERSION   =2.0.8
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] OPERATOR_EDITION   =community
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] OPERATOR_EDITIONS  =['community', 'enterprise']
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] SHELL_VERSION      =8.0.32
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] DEFAULT_VERSION_TAG=8.0.32
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] SIDECAR_VERSION_TAG=8.0.32-2.0.8
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] DEFAULT_IMAGE_REPOSITORY   =mysql
[2023-03-30 13:49:50,016] kopf.activities.star [INFO    ] DEFAULT_IMAGE_REPOSITORY_EE=container-registry.oracle.com/mysql
Added monitor for qa/mysql
GroupMonitor: Trying to connect to a member of cluster qa/mysql
[2023-03-30 13:49:50,076] kopf.activities.star [INFO    ] Activity 'on_startup' succeeded.
[2023-03-30 13:49:50,077] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2023-03-30 13:49:50,078] kopf.activities.auth [INFO    ] Activity 'login_via_client' succeeded.
[2023-03-30 13:49:50,078] kopf._core.engines.a [INFO    ] Initial authentication has finished.
2023-03-30 13:49:50: Info: mysqlx.get_session: tid=4130675: CONNECTED: mysql-1.mysql-instances.qa.svc.cluster.local:33060
GroupMonitor: Connect member of qa/mysql OK <Session:mysqladmin@mysql-1.mysql-instances.qa.svc.cluster.local:33060>
[2023-03-30 13:49:50,299] kopf._core.engines.p [INFO    ] Resuming operations after the pause. Conflicting operators with the same priority are gone.
2023-03-30 13:49:50: Info: Connecting to MySQL at: mysql://mysqladmin@mysql-0.mysql-instances.qa.svc.cluster.local:3306
2023-03-30 13:49:50: Info: mysqlsh.connect_dba: tid=125117: CONNECTED: mysql-0.mysql-instances.qa.svc.cluster.local:3306
2023-03-30 13:49:50: Info: Connecting to MySQL at: mysql://mysqladmin@mysql-0.mysql-instances.qa.svc.cluster.local:3306?connect-timeout=5000
[31 Mar 2023 0:23] MySQL Verification Team
Hi,
I'm having issues reproducing this. I'm using open shift to test and it works ok. Do you have always the same problem or is it intermittent? 

thanks
[31 Mar 2023 7:11] Hemant Kumar
Hi,
Yes this problem occurs in all AKS clusters.
[20 Jun 21:37] David Caputo
We are experiencing this exact same issue, also running on Azure (AKS):

kopf._core.engines.p [ERROR   ] Request attempt #1/9 failed; will retry: PATCH https://10.0.0.1:443/apis/zalando.org/v1/clusterkopfpeerings/mysql-operator -> ClientConnectorError(ConnectionKey(host='10.0.0.1', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=182225097003713860), ConnectionAbortedError('SSL handshake is taking longer than 60.0 seconds: aborting the connection'))

The problem is resolved by restarting the mysql-operator pod, but it recurs shortly thereafter.