| Bug #110273 | NDB Cluster hangs on restarting management node 2 and shows InitialSystemRestart | ||
|---|---|---|---|
| Submitted: | 6 Mar 2023 3:12 | Modified: | 17 Mar 2023 14:41 |
| Reporter: | Yen Jung Peng | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Cluster NDB Operator | Severity: | S1 (Critical) |
| Version: | 1.0.1 | OS: | CentOS (7.9) |
| Assigned to: | MySQL Verification Team | CPU Architecture: | x86 (64) |
| Tags: | kubernetes | ||
[15 Mar 2023 17:56]
MySQL Verification Team
Hi, What environment are you using for testing this. I tried using "Red Hat OpenShift Local" and I could not reproduce the problem.
[16 Mar 2023 10:39]
Yen Jung Peng
Hi, MySQL Verification Team,
I have three virtual machines which installed kubernetes cluster (one control plane and two worker nodes), and all of them are CentOS 7.9.
And these are version of kubernetes:
```
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11", GitCommit:"0f75679e3346160939924550fd3591462a4afec6", GitTreeState:"clean", BuildDate:"2023-02-22T13:37:53Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
# kubectl version --output=yaml
clientVersion:
buildDate: "2023-02-22T13:39:33Z"
compiler: gc
gitCommit: 0f75679e3346160939924550fd3591462a4afec6
gitTreeState: clean
gitVersion: v1.24.11
goVersion: go1.19.6
major: "1"
minor: "24"
platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
buildDate: "2023-02-22T13:32:00Z"
compiler: gc
gitCommit: 0f75679e3346160939924550fd3591462a4afec6
gitTreeState: clean
gitVersion: v1.24.11
goVersion: go1.19.6
major: "1"
minor: "24"
platform: linux/amd64
```
Please let me know any information should I offer?
I still can't create all the things as the website said.
[16 Mar 2023 10:40]
Yen Jung Peng
Hi, MySQL Verification Team,
I have three virtual machines which installed kubernetes cluster (one control plane and two worker nodes), and all of them are CentOS 7.9.
And these are version of kubernetes:
```
# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.11", GitCommit:"0f75679e3346160939924550fd3591462a4afec6", GitTreeState:"clean", BuildDate:"2023-02-22T13:37:53Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}
# kubectl version --output=yaml
clientVersion:
buildDate: "2023-02-22T13:39:33Z"
compiler: gc
gitCommit: 0f75679e3346160939924550fd3591462a4afec6
gitTreeState: clean
gitVersion: v1.24.11
goVersion: go1.19.6
major: "1"
minor: "24"
platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
buildDate: "2023-02-22T13:32:00Z"
compiler: gc
gitCommit: 0f75679e3346160939924550fd3591462a4afec6
gitTreeState: clean
gitVersion: v1.24.11
goVersion: go1.19.6
major: "1"
minor: "24"
platform: linux/amd64
```
Please let me know any information should I offer?
I still can't create all the things as the website said.
[17 Mar 2023 8:14]
Yen Jung Peng
Hi Team, NDB cluster is running finally. The reason is firewall, but we still don't know why and there were no error messages showed anything about permission or something related. After we turned off firewalld service, and all the pods are running successfully. This issue could be closed.
[17 Mar 2023 14:41]
MySQL Verification Team
Thanks for the update

Description: Hi team, I'm using a kubernetes cluster on-premise, which means I installed a k8s cluster by installing kubeadm locally on vm. (CentOS 7.9) And then, I installed NDB operator using helm ``` NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION ndbop ndb-operator 1 2023-02-24 17:47:45.803941352 +0800 CST deployed ndb-operator-1.0.1 8.0.32-1.0.1 ``` But when I want to deploy NDB cluster with NDB operator, it hangs on restarting management node 2. > kubectl apply -f docs/examples/example-ndb.yaml ndbcluster.mysql.oracle.com/example-ndb created > kubectl get all ``` NAME READY STATUS RESTARTS AGE pod/example-ndb-mgmd-0 1/1 Running 0 2d17h pod/example-ndb-mgmd-1 0/1 CrashLoopBackOff 759 (113s ago) 2d17h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/example-ndb-mgmd ClusterIP 10.108.176.248 <none> 1186/TCP 2d17h service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 164d NAME READY AGE statefulset.apps/example-ndb-mgmd 1/2 2d17h NAME REPLICA MANAGEMENT NODES DATA NODES MYSQL SERVERS AGE UP-TO-DATE ndbcluster.mysql.oracle.com/example-ndb 2 Ready:1/2 Ready:0/2 Ready:0/2 2d17h False ``` > kubectl describe ndbcluster.mysql.oracle.com/example-ndb ``` Message: pod "example-ndb-mgmd-1" struck in waiting state : CrashLoopBackOff : back-off 5m0s restarting failed container=mgmd-container pod=example-ndb-mgmd-1_default(c98d2d43-2d53-444e-8eb0-f2a8afc1136d) Reason: SyncError ``` `kubectl logs pod/example-ndb-mgmd-1` ``` Defaulted container "mgmd-container" out of: mgmd-container, ndb-pod-init-container (init) ++ cat /var/lib/ndb/run/nodeId.val + /usr/sbin/ndb_mgmd -f /var/lib/ndb/config/config.ini --initial --nodaemon --config-cache=0 --ndb-nodeid=2 WARNING: --ndb-connectstring is ignored when mgmd is started with -f or config-file. MySQL Cluster Management Server mysql-8.0.32 ndb-8.0.32 2023-03-06 03:00:07 [MgmtSrvr] INFO -- Skipping check of config directory since config cache is disabled. 2023-03-06 03:00:08 [MgmtSrvr] INFO -- Warning: Could not resolve hostname [node 3]: example-ndb-ndbmtd-0.example-ndb-ndbmtd.default.svc.cluster.local 2023-03-06 03:00:08 [MgmtSrvr] INFO -- Warning: Could not resolve hostname [node 4]: example-ndb-ndbmtd-1.example-ndb-ndbmtd.default.svc.cluster.local 2023-03-06 03:00:08 [MgmtSrvr] INFO -- Cluster configuration has multiple Management nodes. Please start the other mgmd nodes if not started yet. 2023-03-06 03:00:08 [MgmtSrvr] INFO -- Got initial configuration from '/var/lib/ndb/config/config.ini', will try to set it when all ndb_mgmd(s) started 2023-03-06 03:00:09 [MgmtSrvr] INFO -- Node 2: Node 2 Connected 2023-03-06 03:00:09 [MgmtSrvr] INFO -- Id: 2, Command port: *:1186 2023-03-06 03:00:09 [MgmtSrvr] INFO -- MySQL Cluster Management Server mysql-8.0.32 ndb-8.0.32 started == ConfigManager disabled -- manager thread will exit == 2023-03-06 03:00:09 [MgmtSrvr] INFO -- Node 2: Node 1 Connected 2023-03-06 03:02:41 [MgmtSrvr] INFO -- Received SIGTERM. Performing stop. ``` How to repeat: I followed step here at https://dev.mysql.com/doc/ndb-operator/en/deployment-creation.html