Bug #110273 NDB Cluster hangs on restarting management node 2 and shows InitialSystemRestart
Submitted: 6 Mar 2023 3:12 Modified: 17 Mar 2023 14:41
Reporter: Yen Jung Peng Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster NDB Operator Severity:S1 (Critical)
Version:1.0.1 OS:CentOS (7.9)
Assigned to: MySQL Verification Team CPU Architecture:x86 (64)
Tags: kubernetes

[6 Mar 2023 3:12] Yen Jung Peng
Description:
Hi team,

I'm using a kubernetes cluster on-premise, which means I installed a k8s cluster by installing kubeadm locally on vm. (CentOS 7.9)

And then, I installed NDB operator using helm

```
NAME 	NAMESPACE   	REVISION	UPDATED                                	STATUS  	CHART             	APP VERSION
ndbop	ndb-operator	1       	2023-02-24 17:47:45.803941352 +0800 CST	deployed	ndb-operator-1.0.1	8.0.32-1.0.1
```

But when I want to deploy NDB cluster with NDB operator, it hangs on restarting management node 2. 

> kubectl apply -f docs/examples/example-ndb.yaml
ndbcluster.mysql.oracle.com/example-ndb created

> kubectl get all

```
NAME                     READY   STATUS             RESTARTS         AGE
pod/example-ndb-mgmd-0   1/1     Running            0                2d17h
pod/example-ndb-mgmd-1   0/1     CrashLoopBackOff   759 (113s ago)   2d17h

NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/example-ndb-mgmd   ClusterIP   10.108.176.248   <none>        1186/TCP   2d17h
service/kubernetes         ClusterIP   10.96.0.1        <none>        443/TCP    164d

NAME                                READY   AGE
statefulset.apps/example-ndb-mgmd   1/2     2d17h

NAME                                      REPLICA   MANAGEMENT NODES   DATA NODES   MYSQL SERVERS   AGE     UP-TO-DATE
ndbcluster.mysql.oracle.com/example-ndb   2         Ready:1/2          Ready:0/2    Ready:0/2       2d17h   False
```

> kubectl describe ndbcluster.mysql.oracle.com/example-ndb

```
    Message:               pod "example-ndb-mgmd-1" struck in waiting state : CrashLoopBackOff : back-off 5m0s restarting failed container=mgmd-container pod=example-ndb-mgmd-1_default(c98d2d43-2d53-444e-8eb0-f2a8afc1136d)
    Reason:                SyncError
```

`kubectl logs pod/example-ndb-mgmd-1`

```
Defaulted container "mgmd-container" out of: mgmd-container, ndb-pod-init-container (init)
++ cat /var/lib/ndb/run/nodeId.val
+ /usr/sbin/ndb_mgmd -f /var/lib/ndb/config/config.ini --initial --nodaemon --config-cache=0 --ndb-nodeid=2
WARNING: --ndb-connectstring is ignored when mgmd is started with -f or config-file.
MySQL Cluster Management Server mysql-8.0.32 ndb-8.0.32
2023-03-06 03:00:07 [MgmtSrvr] INFO     -- Skipping check of config directory since config cache is disabled.
2023-03-06 03:00:08 [MgmtSrvr] INFO     -- Warning: Could not resolve hostname [node 3]: example-ndb-ndbmtd-0.example-ndb-ndbmtd.default.svc.cluster.local
2023-03-06 03:00:08 [MgmtSrvr] INFO     -- Warning: Could not resolve hostname [node 4]: example-ndb-ndbmtd-1.example-ndb-ndbmtd.default.svc.cluster.local
2023-03-06 03:00:08 [MgmtSrvr] INFO     -- Cluster configuration has multiple Management nodes. Please start the other mgmd nodes if not started yet.
2023-03-06 03:00:08 [MgmtSrvr] INFO     -- Got initial configuration from '/var/lib/ndb/config/config.ini', will try to set it when all ndb_mgmd(s) started
2023-03-06 03:00:09 [MgmtSrvr] INFO     -- Node 2: Node 2 Connected
2023-03-06 03:00:09 [MgmtSrvr] INFO     -- Id: 2, Command port: *:1186
2023-03-06 03:00:09 [MgmtSrvr] INFO     -- MySQL Cluster Management Server mysql-8.0.32 ndb-8.0.32 started
== ConfigManager disabled -- manager thread will exit ==
2023-03-06 03:00:09 [MgmtSrvr] INFO     -- Node 2: Node 1 Connected
2023-03-06 03:02:41 [MgmtSrvr] INFO     -- Received SIGTERM. Performing stop.
```

How to repeat:
I followed step here at https://dev.mysql.com/doc/ndb-operator/en/deployment-creation.html
[15 Mar 2023 17:56] MySQL Verification Team
Hi,

What environment are you using for testing this. I tried using "Red Hat OpenShift Local" and I could not reproduce the problem.
[16 Mar 2023 10:39] Yen Jung Peng
Hi, MySQL Verification Team,

I have three virtual machines which installed kubernetes cluster (one control plane and two worker nodes), and all of them are CentOS 7.9.

And these are version of kubernetes:

```
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11", GitCommit:"0f75679e3346160939924550fd3591462a4afec6", GitTreeState:"clean", BuildDate:"2023-02-22T13:37:53Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

# kubectl version --output=yaml
clientVersion:
  buildDate: "2023-02-22T13:39:33Z"
  compiler: gc
  gitCommit: 0f75679e3346160939924550fd3591462a4afec6
  gitTreeState: clean
  gitVersion: v1.24.11
  goVersion: go1.19.6
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2023-02-22T13:32:00Z"
  compiler: gc
  gitCommit: 0f75679e3346160939924550fd3591462a4afec6
  gitTreeState: clean
  gitVersion: v1.24.11
  goVersion: go1.19.6
  major: "1"
  minor: "24"
  platform: linux/amd64
```

Please let me know any information should I offer?
I still can't create all the things as the website said.
[16 Mar 2023 10:40] Yen Jung Peng
Hi, MySQL Verification Team,

I have three virtual machines which installed kubernetes cluster (one control plane and two worker nodes), and all of them are CentOS 7.9.

And these are version of kubernetes:

```
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11", GitCommit:"0f75679e3346160939924550fd3591462a4afec6", GitTreeState:"clean", BuildDate:"2023-02-22T13:37:53Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

# kubectl version --output=yaml
clientVersion:
  buildDate: "2023-02-22T13:39:33Z"
  compiler: gc
  gitCommit: 0f75679e3346160939924550fd3591462a4afec6
  gitTreeState: clean
  gitVersion: v1.24.11
  goVersion: go1.19.6
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2023-02-22T13:32:00Z"
  compiler: gc
  gitCommit: 0f75679e3346160939924550fd3591462a4afec6
  gitTreeState: clean
  gitVersion: v1.24.11
  goVersion: go1.19.6
  major: "1"
  minor: "24"
  platform: linux/amd64
```

Please let me know any information should I offer?
I still can't create all the things as the website said.
[17 Mar 2023 8:14] Yen Jung Peng
Hi Team,

NDB cluster is running finally.
The reason is firewall, but we still don't know why and there were no error messages showed anything about permission or something related.
After we turned off firewalld service, and all the pods are running successfully.

This issue could be closed.
[17 Mar 2023 14:41] MySQL Verification Team
Thanks for the update