Description:
Environment: one management node ndb-mgmd and four data+SQL nodes ndb1..ndb4 (Docker).
Each MySQL node is configured to use NDB and explicitly sets a fixed API nodeid and connectstring:
ndb1: /etc/mysql/conf.d/zz-ndb.cnf
default-storage-engine=ndbcluster
ndbcluster
ndb-nodeid=6
ndb-connectstring=ndb-mgmd:1186
ndb2: same file shows ndb-nodeid=7, connectstring ndb-mgmd:1186
ndb3: ndb-nodeid=8, connectstring ndb-mgmd:1186
ndb4: ndb-nodeid=9, connectstring ndb-mgmd:1186
However, the cluster config.ini visible inside nodes shows proper [ndb_mgmd] and [ndbd] sections (NodeId 1–5), but NodeId 6–9 appear as bare lines without an obvious [mysqld]/[api] section header (based on the collected output below). This suggests the API(SQL) node definitions may be missing/mis-parsed/ignored, creating a mismatch between MySQL’s ndb-nodeid settings and mgmd’s node configuration.
Collected evidence (grep excerpts from each ndb* container):
/etc/ndb/config.ini includes:
[ndb_mgmd] NodeId=1 HostName=ndb-mgmd DataDir=/var/lib/ndb-mgmd
[ndbd] NodeId=2 HostName=ndb1 DataDir=/var/lib/ndb-data
[ndbd] NodeId=3 HostName=ndb2 DataDir=/var/lib/ndb-data
[ndbd] NodeId=4 HostName=ndb3 DataDir=/var/lib/ndb-data
[ndbd] NodeId=5 HostName=ndb4 DataDir=/var/lib/ndb-data
then lines:
NodeId=6 HostName=ndb1
NodeId=7 HostName=ndb2
NodeId=8 HostName=ndb3
NodeId=9 HostName=ndb4
(No [mysqld]/[api] header observed in the provided excerpt.)
Data nodes do connect to mgmd and get allocated nodeids 2–5 (from /var/lib/ndb-data/ndbd.out):
2026-01-22 09:49:47 [ndbd] ... Angel connected to 'ndb-mgmd:1186' (ndb1)
2026-01-22 09:49:48 [ndbd] ... Angel allocated nodeid: 2 (ndb1)
2026-01-22 09:49:49 ... allocated nodeid: 3 (ndb2)
2026-01-22 09:49:50 ... allocated nodeid: 4 (ndb3)
2026-01-22 09:49:51 ... allocated nodeid: 5 (ndb4)
At the same time, large ndb_*_out.log and many trace segments exist under /var/lib/ndb-data/ (notably on ndb3/ndb4), indicating heavy retries/diagnostics output:
ndb1: /var/lib/ndb-data/ndb_2_out.log ~20M (2026-01-24 08:57)
ndb2: /var/lib/ndb-data/ndb_3_out.log ~18M (2026-01-24 09:12)
ndb3: /var/lib/ndb-data/ndb_4_out.log ~23M (2026-01-24 08:53) with ndb_4_trace.log.1...25
ndb4: /var/lib/ndb-data/ndb_5_out.log ~21M (2026-01-24 09:16) with ndb_5_trace.log.1...25
Management node log exists and rotates as expected:
ndb-mgmd: /var/lib/ndb-mgmd/ndb_1_cluster.log and ndb_1_cluster.log.1...5
No container OOM/restart was observed:
docker inspect shows OOMKilled=false, RestartCount=0 for ndb-mgmd and ndb1..ndb4.
Kernel log search around the incident window did not show OOM/hung messages.
How to repeat:
Start the cluster containers: ndb-mgmd, ndb1, ndb2, ndb3, ndb4.
Verify MySQL node configuration (inside each ndb1..ndb4):
/etc/mysql/conf.d/zz-ndb.cnf contains ndb-nodeid=6/7/8/9 and ndb-connectstring=ndb-mgmd:1186.
Check /etc/ndb/config.ini inside any node for the node definitions:
Confirm [ndb_mgmd] and [ndbd] blocks exist for NodeId 1–5.
Observe NodeId 6–9 appear without an explicit [mysqld]/[api] header in the same excerpt.
Check data node join evidence and logs:
/var/lib/ndb-data/ndbd.out shows “Angel connected” and “Angel allocated nodeid: 2/3/4/5” at 2026-01-22 09:49:47..51 UTC.
Inspect /var/lib/ndb-data/ndb_*_out.log and ndb_*_trace.log.* growth.
Key collected commands/outputs used in this report (already executed):
grep -RIn "ndb|ndbcluster|ndb_mgmd|NodeId|connectstring" /etc (inside ndb1..ndb4)
ls -lah /var/lib/ndb-mgmd (inside ndb-mgmd)
ls -lah /var/lib/ndb-data (inside ndb1..ndb4)
tail -n 200 /var/lib/ndb-data/ndbd.out (inside ndb1..ndb4)
docker inspect -f 'OOMKilled=... RestartCount=...' <container> (host)
Suggested fix:
Validate and harden mgmd config parsing/validation for API(SQL) nodes:
If NodeId/HostName lines appear outside a valid section, mgmd should reject the config with a clear error instead of accepting it silently.
Ensure API nodes must be declared under explicit [mysqld] or [api] blocks, and surface warnings/errors when missing.
Improve diagnostic messages in mgmd log when MySQL nodes attempt to join with fixed ndb-nodeid not present (or not parsed) in the cluster configuration.