Description:
While performing a data backup, it was noticed that error 104 occurred and the cluster was stuck in a loop of not being able to assign a node ID to the API.
The main manifestation of this is the frequent disconnection of nodes: multiple occurrences of the Node 5 disconnected in recv with errnum: 104 in state: 0and corresponding alerts such as ALERT -- Node 4: Node 5 Disconnected。
as well as the appearance of being unable to assign a node ID to the API: Multiple occurrences Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'.
How to repeat:
The specific log messages are as follows:
```
2024-09-23 13:34:40 [MgmtSrvr] INFO -- Node 5: Communication to Node 6 opened
2024-09-23 13:34:40 [MgmtSrvr] INFO -- Node 5: Communication to Node 7 opened
2024-09-23 13:34:40 [MgmtSrvr] INFO -- Node 5: Communication to Node 8 opened
2024-09-23 13:34:40 [MgmtSrvr] INFO -- Node 5: Communication to Node 9 opened
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 6 Connected
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 7 Connected
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 9 Connected
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 7: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 9: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 6: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 8 Connected
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Node 8: API mysql-8.0.35 ndb-8.0.35
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Suma: initiate handover for startup with nodes 0000000000000000000000000000000000000008 GCI: 5731503
2024-09-23 13:34:41 [MgmtSrvr] INFO -- Node 5: Suma: handover from node 3 gci: 5731503 buckets: 00000002 (2)
2024-09-23 13:34:42 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting
2024-09-23 13:34:45 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting - Repeated 2 times
2024-09-23 13:34:46 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting
2024-09-23 13:34:46 [MgmtSrvr] INFO -- Node 5: Start phase 101 completed (node restart)
2024-09-23 13:34:46 [MgmtSrvr] INFO -- Node 2: NR Status: node=5,OLD=Wait handover of subscriptions,NEW=Restart completed
2024-09-23 13:34:46 [MgmtSrvr] INFO -- Node 5: Started (mysql-8.0.35 ndb-8.0.35)
2024-09-23 13:34:46 [MgmtSrvr] INFO -- Node 2: DICT: unlocked by node 5 for NodeRestart
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 5: index 14 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index 15 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index-build table 13 index: 16 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 5: index-build table 13 index: 16 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 3: index-build table 13 index: 16 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 4: index-build table 13 index: 16 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index 17 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index-build table 13 index: 18 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 3: index-build table 13 index: 18 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 4: index-build table 13 index: 18 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 5: index-build table 13 index: 18 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index 19 stats version 1: scan frag: created 0 samples
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 2: index-build table 13 index: 20 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 3: index-build table 13 index: 20 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 4: index-build table 13 index: 20 processed 0 rows
2024-09-23 13:34:47 [MgmtSrvr] INFO -- Node 5: index-build table 13 index: 20 processed 0 rows
2024-09-23 13:34:48 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update starting
2024-09-23 13:34:48 [MgmtSrvr] INFO -- Node 2: index 89 stats version 3: scan frag: created 0 samples
2024-09-23 13:34:48 [MgmtSrvr] INFO -- Node 2: DICT: index 89 stats auto-update done
2024-09-23 13:34:49 [MgmtSrvr] INFO -- Node 2: DICT: index 97 stats auto-update starting
2024-09-23 13:34:49 [MgmtSrvr] INFO -- Node 5: index 97 stats version 2: scan frag: created 19 samples
2024-09-23 13:34:49 [MgmtSrvr] INFO -- Node 2: DICT: index 97 stats auto-update done
2024-09-23 13:34:50 [MgmtSrvr] INFO -- Node 2: DICT: index 99 stats auto-update starting
2024-09-23 13:34:50 [MgmtSrvr] INFO -- Node 5: index 99 stats version 2: scan frag: created 5 samples
2024-09-23 13:34:50 [MgmtSrvr] INFO -- Node 2: DICT: index 99 stats auto-update done
2024-09-23 13:34:51 [MgmtSrvr] INFO -- Node 2: Cluster shutdown initiated
2024-09-23 13:34:51 [MgmtSrvr] INFO -- Node 3: Cluster shutdown initiated
2024-09-23 13:34:51 [MgmtSrvr] INFO -- Node 4: Cluster shutdown initiated
2024-09-23 13:34:51 [MgmtSrvr] INFO -- Node 5: Cluster shutdown initiated
2024-09-23 13:34:57 [MgmtSrvr] INFO -- Node 3 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:34:57 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:34:58 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:34:58 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 4 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 5 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 3: Node shutdown completed.
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 4: Node shutdown completed.
2024-09-23 13:34:58 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected
2024-09-23 13:34:58 [MgmtSrvr] ALERT -- Node 1: Node 5 Disconnected
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 5: Node shutdown completed.
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:34:58 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.172.10.10
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Buffering maximum epochs 100
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 1: Node 3 Connected
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Start phase 0 completed (system restart)
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Communication to Node 3 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Communication to Node 4 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Communication to Node 5 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Waiting 30 sec for nodes 3, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 no-wait: ]
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Alloc node id 4 rejected, no new president yet
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Nodeid 4 allocated for NDB at 192.172.10.11
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 1: Node 4 Connected
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed (system restart)
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Communication to Node 5 opened
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Waiting 30 sec for nodes 2, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 3 no-wait: ]
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Alloc node id 5 rejected, no new president yet
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Nodeid 5 allocated for NDB at 192.172.10.12
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 3: Node 2 Connected
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Node 3 Connected
2024-09-23 13:34:59 [MgmtSrvr] INFO -- Node 2: Waiting 29 sec for nodes 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 and 3 no-wait: ]
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 1: Node 5 Connected
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 5 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 4 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 5: Node shutdown completed.
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 4: Node shutdown completed.
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 3 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:35:00 [MgmtSrvr] INFO -- Node 3: Node shutdown completed.
2024-09-23 13:35:00 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:00 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected
2024-09-23 13:35:00 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected
2024-09-23 13:35:00 [MgmtSrvr] ALERT -- Node 1: Node 5 Disconnected
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.172.10.10
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Buffering maximum epochs 100
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 1: Node 3 Connected
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Start phase 0 completed (system restart)
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Communication to Node 3 opened
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Communication to Node 4 opened
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Communication to Node 5 opened
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Node 2: Waiting 30 sec for nodes 3, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 no-wait: ]
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Alloc node id 4 rejected, no new president yet
2024-09-23 13:35:01 [MgmtSrvr] INFO -- Nodeid 4 allocated for NDB at 192.172.10.11
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 1: Node 4 Connected
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed (system restart)
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 opened
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Communication to Node 5 opened
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Waiting 30 sec for nodes 2, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 3 no-wait: ]
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 3: Node shutdown completed.
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 4 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:02 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:02 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected
2024-09-23 13:35:02 [MgmtSrvr] INFO -- Node 4: Node shutdown completed.
2024-09-23 13:35:02 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 192.172.10.10
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 1: Node 3 Connected
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 2: Start phase 0 completed (system restart)
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 2: Communication to Node 3 opened
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 2: Communication to Node 4 opened
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 2: Communication to Node 5 opened
2024-09-23 13:35:03 [MgmtSrvr] INFO -- Node 2: Waiting 30 sec for nodes 3, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 no-wait: ]
2024-09-23 13:35:04 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:04 [MgmtSrvr] INFO -- Node 3 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:04 [MgmtSrvr] INFO -- Node 3: Node shutdown completed.
2024-09-23 13:35:04 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:35:04 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:04 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected
2024-09-23 13:35:04 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Buffering maximum epochs 100
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Start phase 0 completed (system restart)
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Communication to Node 3 opened
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Communication to Node 4 opened
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Communication to Node 5 opened
2024-09-23 13:35:05 [MgmtSrvr] INFO -- Node 2: Waiting 30 sec for nodes 3, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 no-wait: ]
2024-09-23 13:35:06 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:06 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:06 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:35:06 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Buffering maximum epochs 100
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Start phase 0 completed (system restart)
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Communication to Node 3 opened
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Communication to Node 4 opened
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Communication to Node 5 opened
2024-09-23 13:35:07 [MgmtSrvr] INFO -- Node 2: Waiting 30 sec for nodes 3, 4 and 5 to connect, nodes [ all: 2, 3, 4 and 5 connected: 2 no-wait: ]
2024-09-23 13:35:08 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:08 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:08 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:35:08 [MgmtSrvr] INFO -- Nodeid 2 allocated for NDB at 192.172.10.9
2024-09-23 13:35:09 [MgmtSrvr] INFO -- Node 1: Node 2 Connected
2024-09-23 13:35:09 [MgmtSrvr] INFO -- Node 2 disconnected in recv with errnum: 104 in state: 0
2024-09-23 13:35:09 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected
2024-09-23 13:35:09 [MgmtSrvr] INFO -- Node 2: Node shutdown completed.
2024-09-23 13:36:16 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:19 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:23 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:26 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:29 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:32 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:35 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:38 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:42 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:45 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:48 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:51 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:54 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:36:57 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:37:00 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
2024-09-23 13:37:04 [MgmtSrvr] WARNING -- Unable to allocate nodeid for API at 192.172.10.9. Returned error: 'Cluster not ready for nodeid allocation.'
```
Suggested fix:
During the reboot process, node 5 may fail to load all necessary services or configurations, resulting in connection failure. Frequent disconnections of node 5 result in the cluster failing to reach a stable state where the management server is unable to assign IDs to new nodes; it may also be because the node ID pool may be full, or due to an error that prevents the IDs from being correctly reclaimed and assigned, but either way, there may be a bug in the way the management server handles the assignment of node IDs, especially in an unstable state of the cluster.