MySQL Bugs: #116350: metadata lock fail during sync in mysql NDB cluster

Bug #116350	metadata lock fail during sync in mysql NDB cluster
Submitted:	14 Oct 2024 0:36	Modified:	14 Oct 2024 5:25
Reporter:	CunDi Fang	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster/J	Severity:	S3 (Non-critical)
Version:	8.0.35-cluster MySQL Cluster Community S	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
I was performing operations in a data table with some rather complex dependencies, and then it became clear that the synchronization process of the metadata was faulty, resulting in frequent disconnections and links between the two data nodes, and ultimately the entire cluster crashed.

How to repeat:
I was given some more complex data on a cluster of 4 data nodes, i.e. containing some inter-table dependencies, and some column constraints. Then a problem occurred when doing a transaction interaction.

ndb_2_out.log:
```
2024-10-08 09:55:42 [ndbd] INFO     -- DROP_TAB_REQ: tab: 120, tabLcpStatus: 3
2024-10-08 09:55:43 [ndbd] INFO     -- DROP_TAB_REQ: tab: 135, tabLcpStatus: 3
2024-10-08 09:55:43 [ndbd] INFO     -- DROP_TAB_REQ: tab: 136, tabLcpStatus: 3
2024-10-08 09:55:43 [ndbd] INFO     -- DROP_TAB_REQ: tab: 134, tabLcpStatus: 3
2024-10-08 09:55:43 [ndbd] INFO     -- DROP_TAB_REQ: tab: 137, tabLcpStatus: 3
2024-10-08 09:55:43 [ndbd] INFO     -- DROP_TAB_REQ: tab: 126, tabLcpStatus: 3
2024-10-08 09:56:27 [ndbd] INFO     -- Node 5 disconnected in recv with errnum: 104 in state: 0
2024-10-08 09:56:27 [ndbd] INFO     -- findNeighbours from: 6085 old (left: 5 right: 3) new (4 3)
2024-10-08 09:56:27 [ndbd] ALERT    -- Arbitration check won - node group majority
2024-10-08 09:56:27 [ndbd] INFO     -- President restarts arbitration thread [state=6]
2024-10-08 09:56:27 [ndbd] INFO     -- NR Status: node=5,OLD=Initial state,NEW=Node failed, fail handling ongoing
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Started failure handling for node 5
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Starting take over of node 5
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 5 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: GCP completion 5782060/10 waiting for node failure handling (1) to complete. Seizing record for GCP.
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 5 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 5 waiting for NF_TAKEOVER.
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Completed take over of failed node 5
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 5 complete.
2024-10-08 09:56:27 [ndbd] INFO     -- DBTC 0: Completing GCP 5782060/10 on node failure takeover completion.
2024-10-08 09:56:28 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Node restart ongoing
2024-10-08 09:56:28 [ndbd] INFO     -- NR Status: node=5,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2024-10-08 09:56:28 [ndbd] INFO     -- Node 5 has completed node fail handling
2024-10-08 09:56:32 [ndbd] INFO     -- Alloc node id for node 5 succeeded
2024-10-08 09:56:32 [ndbd] INFO     -- NR Status: node=5,OLD=Node failure handling complete,NEW=Allocated node id
2024-10-08 09:56:33 [ndbd] INFO     -- findNeighbours from: 2921 old (left: 4 right: 3) new (5 3)
2024-10-08 09:56:33 [ndbd] INFO     -- NR Status: node=5,OLD=Allocated node id,NEW=Included in heartbeat protocol
2024-10-08 09:56:33 [ndbd] INFO     -- Start node: 5 using node restart
2024-10-08 09:56:33 [ndbd] INFO     -- NR Status: node=5,OLD=Included in heartbeat protocol,NEW=NDBCNTR master permitted us
2024-10-08 09:56:33 [ndbd] INFO     -- NR Status: node=5,OLD=NDBCNTR master permitted us,NEW=All nodes permitted us
2024-10-08 09:56:33 [ndbd] INFO     -- NR Status: node=5,OLD=All nodes permitted us,NEW=Wait for LCP complete to copy meta data
2024-10-08 09:56:33 [ndbd] INFO     -- NR Status: node=5,OLD=Wait for LCP complete to copy meta data,NEW=Copy meta data to start node
2024-10-08 09:56:37 [ndbd] INFO     -- NR Status: node=5,OLD=Copy meta data to start node,NEW=Include node in LCP/GCP protocols
2024-10-08 09:56:38 [ndbd] INFO     -- NR Status: node=5,OLD=Include node in LCP/GCP protocols,NEW=Restore fragments ongoing
2024-10-08 09:56:38 [ndbd] INFO     -- NR Status: node=5,OLD=Restore fragments ongoing,NEW=Undo Disk data ongoing
2024-10-08 09:56:38 [ndbd] INFO     -- NR Status: node=5,OLD=Undo Disk data ongoing,NEW=Execute REDO logs ongoing
2024-10-08 09:56:38 [ndbd] INFO     -- NR Status: node=5,OLD=Execute REDO logs ongoing,NEW=Build indexes ongoing
2024-10-08 09:56:38 [ndbd] INFO     -- NR Status: node=5,OLD=Build indexes ongoing,NEW=Synchronize start node with live nodes
2024-10-08 09:56:40 [ndbd] INFO     -- NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2024-10-08 09:56:41 [ndbd] INFO     -- prepare to takeover bucket: 1
2024-10-08 09:56:46 [ndbd] INFO     -- 5782070/0 (5782069/4294967295) switchover complete bucket 1 state: 200
2024-10-08 09:56:46 [ndbd] INFO     -- shutdown handover takeover
2024-10-08 09:56:47 [ndbd] INFO     -- Node 4 disconnected in recv with errnum: 104 in state: 0
2024-10-08 09:56:47 [ndbd] INFO     -- findNeighbours from: 6085 old (left: 5 right: 3) new (5 3)
2024-10-08 09:56:48 [ndbd] INFO     -- Node 5 disconnected in recv with errnum: 104 in state: 0
2024-10-08 09:56:48 [ndbd] INFO     -- findNeighbours from: 6085 old (left: 5 right: 3) new (3 3)
2024-10-08 09:56:48 [ndbd] ALERT    -- Arbitration check won - all node groups and more than 1/2 nodes left
2024-10-08 09:56:48 [ndbd] INFO     -- President restarts arbitration thread [state=6]
2024-10-08 09:56:48 [ndbd] INFO     -- NR Status: node=4,OLD=Initial state,NEW=Node failed, fail handling ongoing
2024-10-08 09:56:48 [ndbd] INFO     -- NR Status: node=5,OLD=Wait LCP to ensure durability,NEW=Node failed, fail handling ongoing
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Started failure handling for node 4
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Starting take over of node 4
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 4 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Started failure handling for node 5
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Inserting failed node 5 into takeover queue, length 1
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_BLOCK_HANDLE completed, failure handling for node 5 waiting for NF_TAKEOVER, NF_CHECK_SCAN, NF_CHECK_TRANSACTION.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: GCP completion 5782070/10 waiting for node failure handling (2) to complete. Seizing record for GCP.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 4 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_SCAN completed, failure handling for node 5 waiting for NF_TAKEOVER, NF_CHECK_TRANSACTION.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 4 waiting for NF_TAKEOVER.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_CHECK_TRANSACTION completed, failure handling for node 5 waiting for NF_TAKEOVER.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Completed take over of failed node 4
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Starting next DBTC node take over for failed node 5, 0 failed nodes remaining in takeover queue
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 4 complete.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Completed take over of failed node 5
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Step NF_TAKEOVER completed, failure handling for node 5 complete.
2024-10-08 09:56:48 [ndbd] INFO     -- DBTC 0: Completing GCP 5782070/10 on node failure takeover completion.
2024-10-08 09:56:48 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Node restart finished
2024-10-08 09:56:50 [ndbd] INFO     -- NR Status: node=4,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2024-10-08 09:56:50 [ndbd] INFO     -- Node 4 has completed node fail handling
2024-10-08 09:56:50 [ndbd] INFO     -- NR Status: node=5,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2024-10-08 09:56:50 [ndbd] INFO     -- Node 5 has completed node fail handling
2024-10-08 09:56:51 [ndbd] INFO     -- Alloc node id for node 4 succeeded
2024-10-08 09:56:51 [ndbd] INFO     -- NR Status: node=4,OLD=Node failure handling complete,NEW=Allocated node id
2024-10-08 09:56:52 [ndbd] INFO     -- findNeighbours from: 2921 old (left: 3 right: 3) new (4 3)
2024-10-08 09:56:52 [ndbd] INFO     -- NR Status: node=4,OLD=Allocated node id,NEW=Included in heartbeat protocol
2024-10-08 09:56:52 [ndbd] INFO     -- Start node: 4 using node restart
2024-10-08 09:56:52 [ndbd] INFO     -- NR Status: node=4,OLD=Included in heartbeat protocol,NEW=NDBCNTR master permitted us
2024-10-08 09:56:52 [ndbd] INFO     -- NR Status: node=4,OLD=NDBCNTR master permitted us,NEW=All nodes permitted us
2024-10-08 09:56:52 [ndbd] INFO     -- NR Status: node=4,OLD=All nodes permitted us,NEW=Wait for LCP complete to copy meta data
2024-10-08 09:56:52 [ndbd] INFO     -- NR Status: node=4,OLD=Wait for LCP complete to copy meta data,NEW=Copy meta data to start node
2024-10-08 09:56:53 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Node restart ongoing
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Copy meta data to start node,NEW=Include node in LCP/GCP protocols
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Include node in LCP/GCP protocols,NEW=Restore fragments ongoing
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Restore fragments ongoing,NEW=Undo Disk data ongoing
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Undo Disk data ongoing,NEW=Execute REDO logs ongoing
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Execute REDO logs ongoing,NEW=Build indexes ongoing
2024-10-08 09:56:56 [ndbd] INFO     -- NR Status: node=4,OLD=Build indexes ongoing,NEW=Synchronize start node with live nodes
2024-10-08 09:56:58 [ndbd] INFO     -- NR Status: node=4,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2024-10-08 09:57:06 [ndbd] INFO     -- Alloc node id for node 5 succeeded
2024-10-08 09:57:06 [ndbd] INFO     -- NR Status: node=5,OLD=Node failure handling complete,NEW=Allocated node id
2024-10-08 09:57:07 [ndbd] INFO     -- findNeighbours from: 2921 old (left: 4 right: 3) new (5 3)
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=Allocated node id,NEW=Included in heartbeat protocol
2024-10-08 09:57:07 [ndbd] INFO     -- Start node: 5 using node restart (node not restorable on its own)
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=Included in heartbeat protocol,NEW=NDBCNTR master permitted us
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=NDBCNTR master permitted us,NEW=All nodes permitted us
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=All nodes permitted us,NEW=Wait for LCP complete to copy meta data
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=Wait for LCP complete to copy meta data,NEW=Copy meta data to start node
2024-10-08 09:57:08 [ndbd] INFO     -- NR Status: node=4,OLD=Wait LCP to ensure durability,NEW=Wait handover of subscriptions
2024-10-08 09:57:08 [ndbd] INFO     -- granting SumaStartMe dict lock to 4
2024-10-08 09:57:08 [ndbd] INFO     -- clearing SumaStartMe dict lock for 4
2024-10-08 09:57:10 [ndbd] INFO     -- NR Status: node=5,OLD=Copy meta data to start node,NEW=Include node in LCP/GCP protocols
2024-10-08 09:57:10 [ndbd] INFO     -- NR Status: node=5,OLD=Include node in LCP/GCP protocols,NEW=Restore fragments ongoing
2024-10-08 09:57:11 [ndbd] INFO     -- NR Status: node=5,OLD=Restore fragments ongoing,NEW=Undo Disk data ongoing
2024-10-08 09:57:11 [ndbd] INFO     -- NR Status: node=5,OLD=Undo Disk data ongoing,NEW=Execute REDO logs ongoing
2024-10-08 09:57:11 [ndbd] INFO     -- NR Status: node=5,OLD=Execute REDO logs ongoing,NEW=Build indexes ongoing
2024-10-08 09:57:11 [ndbd] INFO     -- NR Status: node=5,OLD=Build indexes ongoing,NEW=Synchronize start node with live nodes
2024-10-08 09:57:12 [ndbd] INFO     -- granting SumaHandover dict lock to 4
2024-10-08 09:57:12 [ndbd] INFO     -- prepare to handover bucket: 1
2024-10-08 09:57:13 [ndbd] INFO     -- NR Status: node=5,OLD=Synchronize start node with live nodes,NEW=Wait LCP to ensure durability
2024-10-08 09:57:17 [ndbd] INFO     -- 5782085/0 (5782084/4294967295) switchover complete bucket 1 state: 2
2024-10-08 09:57:17 [ndbd] INFO     -- handover
2024-10-08 09:57:17 [ndbd] INFO     -- clearing SumaHandover dict lock for 4
2024-10-08 09:57:17 [ndbd] INFO     -- NR Status: node=4,OLD=Wait handover of subscriptions,NEW=Restart completed
2024-10-08 09:57:17 [ndbd] INFO     -- Node 4 has completed its restart
2024-10-08 09:57:21 [ndbd] INFO     -- NR Status: node=5,OLD=Wait LCP to ensure durability,NEW=Wait handover of subscriptions
2024-10-08 09:57:21 [ndbd] INFO     -- granting SumaStartMe dict lock to 5
2024-10-08 09:57:21 [ndbd] INFO     -- clearing SumaStartMe dict lock for 5
2024-10-08 09:57:23 [ndbd] INFO     -- granting SumaHandover dict lock to 5
2024-10-08 09:57:29 [ndbd] INFO     -- clearing SumaHandover dict lock for 5
2024-10-08 09:57:29 [ndbd] INFO     -- NR Status: node=5,OLD=Wait handover of subscriptions,NEW=Restart completed
2024-10-08 09:57:29 [ndbd] INFO     -- Node 5 has completed its restart
2024-10-08 09:57:30 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Node restart finished
2024-10-08 09:57:35 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Backup started
2024-10-08 09:57:35 [ndbd] INFO     -- Backup log buffer report : size 16777216 bytes, hwm 52 bytes (0 pct)
2024-10-08 09:57:36 [ndbd] INFO     -- Backup log buffer report : size 16777216 bytes, hwm 52 bytes (0 pct)
2024-10-08 09:57:36 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Backup completed
```

I found that this may be the root cause:
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=Included in heartbeat protocol,NEW=NDBCNTR master permitted us
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=NDBCNTR master permitted us,NEW=All nodes permitted us
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=All nodes permitted us,NEW=Wait for LCP complete to copy meta data
2024-10-08 09:57:07 [ndbd] INFO     -- NR Status: node=5,OLD=Wait for LCP complete to copy meta data,NEW=Copy meta data to start node
2024-10-08 09:57:08 [ndbd] INFO     -- NR Status: node=4,OLD=Wait LCP to ensure durability,NEW=Wait handover of subscriptions

At the same time of ndb4_out.log:
```
2024-10-08 09:56:57 [ndbd] INFO     -- Bring Database On-line Completed
2024-10-08 09:56:57 [ndbd] INFO     -- Starting REDO logging
2024-10-08 09:56:58 [ndbd] INFO     -- Make On-line Database recoverable by waiting for LCP Starting, all parallel threads have now ceased their activity and we have a single wait state here
2024-10-08 09:57:07 [ndbd] INFO     -- findNeighbours from: 2921 old (left: 3 right: 2) new (3 5)
2024-10-08 09:57:08 [ndbd] INFO     -- Make On-line Database recoverable by waiting for LCP Completed, LCP id = 398
2024-10-08 09:57:08 [ndbd] INFO     -- NDB start phase 5 completed
2024-10-08 09:57:08 [ndbd] INFO     -- Start NDB start phase 6
2024-10-08 09:57:08 [ndbd] INFO     -- NDB start phase 6 completed
2024-10-08 09:57:08 [ndbd] INFO     -- Start phase 5 completed
2024-10-08 09:57:08 [ndbd] INFO     -- Phase 5 restored local fragments in its first NDB phase, then copied metadata to our node, and then actual data was copied over to our node, and finally we waited for a local checkpoint to complete
2024-10-08 09:57:08 [ndbd] INFO     -- Start phase 6 completed
2024-10-08 09:57:08 [ndbd] INFO     -- Phase 6 updated blocks about that we've now reached the started state.
2024-10-08 09:57:08 [ndbd] INFO     -- Start phase 7 completed
2024-10-08 09:57:08 [ndbd] INFO     -- Phase 7 mainly activated the asynchronous change events process, and some other background processes
2024-10-08 09:57:08 [ndbd] INFO     -- Start NDB start phase 7
2024-10-08 09:57:08 [ndbd] INFO     -- Foreign Key enabling Starting
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Checking FK trigger ids
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 111 child trigger id to 84 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 77 child trigger id to 94 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 62 child trigger id to 92 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 32 parent trigger id to 59 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 62 parent trigger id to 91 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 91 parent trigger id to 97 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 111 parent trigger id to 83 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 133 parent trigger id to 114 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 82 parent trigger id to 95 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 104 child trigger id to 75 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 77 parent trigger id to 93 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 133 child trigger id to 115 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 125 child trigger id to 118 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 55 child trigger id to 90 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 101 child trigger id to 72 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 32 child trigger id to 62 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 82 child trigger id to 96 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 101 parent trigger id to 71 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 125 parent trigger id to 117 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 55 parent trigger id to 89 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 104 parent trigger id to 74 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Setting FK 91 child trigger id to 98 (4294967040)
2024-10-08 09:57:08 [ndbd] INFO     -- DICT : Checking FK trigger ids complete
2024-10-08 09:57:08 [ndbd] INFO     -- Allocate event buffering page chunk in SUMA, 16 pages, first page ref = 7154
2024-10-08 09:57:09 [ndbd] INFO     -- Foreign key enabling Completed
2024-10-08 09:57:09 [ndbd] INFO     -- NDB start phase 7 completed
2024-10-08 09:57:09 [ndbd] INFO     -- Start phase 8 completed
2024-10-08 09:57:11 [ndbd] INFO     -- Restart complete, updated local sysfile
2024-10-08 09:57:11 [ndbd] INFO     -- Start phase 9 completed
2024-10-08 09:57:11 [ndbd] INFO     -- Phase 9 enabled APIs to start connecting
2024-10-08 09:57:11 [ndbd] INFO     -- Start phase 50 completed
2024-10-08 09:57:11 [ndbd] INFO     -- Suma: handover waiting up to 120000ms for all subscribers to connect
2024-10-08 09:57:11 [ndbd] INFO     -- Start phase 101 waiting 120s for absent subscribers to connect : 6,7,8,9
2024-10-08 09:57:12 [ndbd] INFO     -- Suma: handover from node 2 gci: 5782085 buckets: 00000002 (2)
```

At the same time of ndb4_out.log:
```
2024-10-08 09:57:06 [ndbd] INFO     -- LCPFragWatchdog : WarnElapsed : 60000(ms) MaxElapsed 180000(ms) : MaxGcpWaitLimit 0(ms) period millis : 1000
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 7, name = DBTUX
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 8, name = DBDICT
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 9, name = DBDIH
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 10, name = NDBCNTR
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 11, name = QMGR
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 12, name = TRIX
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 13, name = BACKUP
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 14, name = DBUTIL
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 15, name = SUMA
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 16, name = TSMAN
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 17, name = LGMAN
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 18, name = PGMAN
2024-10-08 09:57:06 [ndbd] INFO     -- pgman: page buffer entries = 10
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 19, name = RESTORE
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 20, name = DBSPJ
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 21, name = THRMAN
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 22, name = DBQLQH
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 23, name = DBQACC
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 24, name = DBQTUP
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 25, name = QBACKUP
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 26, name = DBQTUX
2024-10-08 09:57:06 [ndbd] INFO     -- Sending READ_CONFIG_REQ to index = 27, name = QRESTORE
2024-10-08 09:57:06 [ndbd] INFO     -- READ_CONFIG_REQ phase completed, this phase is used to read configuration and to calculate various sizes and allocate almost all memory needed by the data node in its lifetime
2024-10-08 09:57:06 [ndbd] INFO     -- Not initial start
2024-10-08 09:57:06 [ndbd] INFO     -- Local sysfile: Node not restorable on its own, gci: 5782069, version: 70603
2024-10-08 09:57:06 [ndbd] INFO     -- Start phase 0 completed
2024-10-08 09:57:06 [ndbd] INFO     -- Phase 0 has made some file system initialisations
2024-10-08 09:57:06 [ndbd] INFO     -- We are running with 0 LDM workers and 4 REDO log parts. This means that we can avoid using a mutex to access REDO log parts
2024-10-08 09:57:06 [ndbd] INFO     -- Watchdog KillSwitch off.
2024-10-08 09:57:06 [ndbd] INFO     -- Starting QMGR phase 1
2024-10-08 09:57:06 [ndbd] INFO     -- Starting with m_restart_seq set to 30
2024-10-08 09:57:06 [ndbd] INFO     -- Node not restorable on its own, now starting the Node Inclusion Protocol
2024-10-08 09:57:07 [ndbd] INFO     -- findNeighbours from: 3004 old (left: 65535 right: 65535) new (4 2)
2024-10-08 09:57:07 [ndbd] INFO     -- Include node protocol completed, phase 1 in QMGR completed
2024-10-08 09:57:07 [ndbd] INFO     -- Start phase 1 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Phase 1 initialised some variables and included node in cluster, locked memory if configured to do so
2024-10-08 09:57:07 [ndbd] INFO     -- Starting with m_restart_seq set to 30
2024-10-08 09:57:07 [ndbd] INFO     -- Asking master node to accept our start (nodeId = 2 is master), GCI = 5782069
2024-10-08 09:57:07 [ndbd] INFO     -- NDBCNTR master accepted us into cluster, start NDB start phase 1
2024-10-08 09:57:07 [ndbd] INFO     -- We are performing a node restart
2024-10-08 09:57:07 [ndbd] INFO     -- LDM(0): Started LDM restart phase 1 (read REDO log page headers to init REDO log data)
2024-10-08 09:57:07 [ndbd] INFO     -- Schema file initialisation Starting
2024-10-08 09:57:07 [ndbd] INFO     -- Schema file initialisation Completed
2024-10-08 09:57:07 [ndbd] INFO     -- NDB start phase 1 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Start phase 2 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Phase 2 did more initialisations, master accepted our start, we started REDO log initialisations
2024-10-08 09:57:07 [ndbd] INFO     -- Start NDB start phase 2
2024-10-08 09:57:07 [ndbd] INFO     -- Request permission to start our node from master Starting
2024-10-08 09:57:07 [ndbd] INFO     -- Request permission to start our node from master Completed
2024-10-08 09:57:07 [ndbd] INFO     -- NDB start phase 2 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Start phase 3 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Phase 3 locked the data dictionary, performed local connection setups, we  asked for permission to start our node
2024-10-08 09:57:07 [ndbd] INFO     -- Start NDB start phase 3
2024-10-08 09:57:07 [ndbd] INFO     -- NDB start phase 3 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Start phase 4 completed
2024-10-08 09:57:07 [ndbd] INFO     -- Phase 4 continued preparations of the REDO log
2024-10-08 09:57:07 [ndbd] INFO     -- Start NDB start phase 4
2024-10-08 09:57:07 [ndbd] INFO     -- Request copying of distribution and dictionary information from master Starting
2024-10-08 09:57:07 [ndbd] INFO     -- Completed setting up multiple transporters to nodes in the same node group
2024-10-08 09:57:07 [ndbd] INFO     -- c_lcpState.m_participatingLQH bitmap= 000000000000000000000000000000000000001c
2024-10-08 09:57:07 [ndbd] INFO     -- c_lcpState.m_participatingDIH bitmap= 000000000000000000000000000000000000001c
2024-10-08 09:57:07 [ndbd] INFO     -- LDM(0):Ready to start execute REDO log phase, prepare REDO log phase completed
2024-10-08 09:57:07 [ndbd] INFO     -- Receive arbitrator node 1 [ticket=74870001340c1384]
2024-10-08 09:57:07 [ndbd] INFO     -- Our node now in LCP execution after pausing LCP
2024-10-08 09:57:07 [ndbd] INFO     -- LCP_COMPLETE_REP_Counter_LQH bitmap= [SignalCounter: m_count=2 000000000000000000000000000000000000000c]
2024-10-08 09:57:07 [ndbd] INFO     -- Copying of dictionary information from master Starting
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 1
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 32
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 55
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 62
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 77
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 82
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 91
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 101
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 104
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 111
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 125
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 133
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 2
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 3
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 4
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 5
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 6
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 7
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 8
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 10
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 11
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 13
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 19
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 24
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 30
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 33
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 41
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 49
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 56
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 63
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 65
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 78
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 80
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 83
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 92
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 99
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 102
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 105
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 112
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 120
2024-10-08 09:57:07 [ndbd] INFO     -- Restart recreating table with id = 127
2024-10-08 09:57:10 [ndbd] INFO     -- Copying of dictionary information from master Completed
2024-10-08 09:57:10 [ndbd] INFO     -- Request copying of distribution and dictionary information from master Completed
2024-10-08 09:57:10 [ndbd] INFO     -- NDB start phase 4 completed
2024-10-08 09:57:10 [ndbd] INFO     -- Start NDB start phase 5 (only to DBDIH)
2024-10-08 09:57:10 [ndbd] INFO     -- Restore Database Off-line Starting
2024-10-08 09:57:10 [ndbd] INFO     -- All start fragments sent, requesting LDM to restore all fragments and to execute the REDO log to bring the database to an off-line but consistent state
2024-10-08 09:57:10 [ndbd] INFO     -- LDM instance 0: Restored T2F1 LCP 25 rows, 25 row operations, 1 millis, 25000 row operations/sec)
2024-10-08 09:57:10 [ndbd] INFO     -- LDM instance 0: Restored T2F3 LCP 22 rows, 22 row operations, 1 millis, 22000 row operations/sec)
2024-10-08 09:57:10 [ndbd] INFO     -- LDM instance 0: Restored T3F1 LCP 6 rows, 6 row operations, 1 millis, 6000 row operations/sec)
2024-10-08 09:57:10 [ndbd] INFO     -- LDM instance 0: Restored T3F3 LCP 4 rows, 4 row operations, 1 millis, 4000 row operations/sec)
```

Suggested fix:
Suggest checking the synchronization mechanism of the metadata, there seems to be a problem there.

This is a duplicate of Bug #116349, please use Bug #116349 to write in more details what exactly you did to get the cluster to this state