Description:
After a system reboot neither API-nodes nor Storage nodes can rejoin the cluster.
How to repeat:
Setup a cluster with 1 mgmd, 4 nodes, 1 mysqld and put 2 ndbds + 1 mysqld on the same host.
reboot the system with the mysqld and the 2ndbds.
mysqld is nodeid = 7:
2005-06-30 12:06:16 [MgmSrvr] INFO -- Mgmt server state: nodeid 7 reserved for ip 192.168.1.5, m_reserved_nodes 00000000000000fe.
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 5: Node 7 Connected
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 4: Node 7 Connected
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 2: Node 7 Connected
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 3: Node 7 Connected
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 2: Node 7: API version 4.1.12
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 3: Node 7: API version 4.1.12
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 4: Node 7: API version 4.1.12
2005-06-30 12:06:16 [MgmSrvr] INFO -- Node 5: Node 7: API version 4.1.12
2005-06-30 12:08:58 [MgmSrvr] WARNING -- Node 2: Node 7 missed heartbeat 2
2005-06-30 12:08:58 [MgmSrvr] WARNING -- Node 4: Node 7 missed heartbeat 2
2005-06-30 12:08:58 [MgmSrvr] WARNING -- Node 2: Node 5 missed heartbeat 2
2005-06-30 12:09:00 [MgmSrvr] WARNING -- Node 2: Node 7 missed heartbeat 3
2005-06-30 12:09:00 [MgmSrvr] WARNING -- Node 4: Node 7 missed heartbeat 3
2005-06-30 12:09:00 [MgmSrvr] WARNING -- Node 2: Node 5 missed heartbeat 3
2005-06-30 12:09:00 [MgmSrvr] WARNING -- Node 4: Node 3 missed heartbeat 2
2005-06-30 12:09:01 [MgmSrvr] WARNING -- Node 2: Node 7 missed heartbeat 4
2005-06-30 12:09:01 [MgmSrvr] ALERT -- Node 2: Node 7 declared dead due to missed heartbeat
2005-06-30 12:09:01 [MgmSrvr] INFO -- Node 2: Communication to Node 7 closed
2005-06-30 12:09:01 [MgmSrvr] WARNING -- Node 4: Node 7 missed heartbeat 4
2005-06-30 12:09:01 [MgmSrvr] ALERT -- Node 4: Node 7 declared dead due to missed heartbeat
2005-06-30 12:09:01 [MgmSrvr] INFO -- Node 4: Communication to Node 7 closed
2005-06-30 12:09:01 [MgmSrvr] ALERT -- Node 2: Node 7 Disconnected
2005-06-30 12:09:01 [MgmSrvr] ALERT -- Node 4: Node 7 Disconnected
later when the mysqld wants to rejoin:
2005-06-30 12:09:04 [MgmSrvr] INFO -- Node 4: Communication to Node 7 opened
2005-06-30 12:09:05 [MgmSrvr] INFO -- Node 2: Communication to Node 7 opened
2005-06-30 12:09:06 [MgmSrvr] INFO -- Node 4: Communication to Node 3 opened
2005-06-30 12:09:06 [MgmSrvr] INFO -- Node 4: Communication to Node 5 opened
2005-06-30 12:09:07 [MgmSrvr] INFO -- Node 2: Communication to Node 3 opened
2005-06-30 12:09:07 [MgmSrvr] INFO -- Node 2: Communication to Node 5 opened
2005-06-30 12:11:57 [MgmSrvr] WARNING -- Allocate nodeid (0) failed. Connection from ip 192.168.1.5. Returned error string "No free node id found for ndbd(NDB)."
2005-06-30 12:11:57 [MgmSrvr] INFO -- Mgmt server state: node id's 1 3 5 7 not connected but reserved
2005-06-30 12:12:00 [MgmSrvr] WARNING -- Allocate nodeid (0) failed. Connection from ip 192.168.1.5. Returned error string "No free node id found for ndbd(NDB)."
2005-06-30 12:12:00 [MgmSrvr] INFO -- Mgmt server state: node id's 1 3 5 7 not connected but reserved
2005-06-30 12:12:03 [MgmSrvr] WARNING -- Allocate nodeid (0) failed. Connection from ip 192.168.1.5. Returned error string "No free node id found for ndbd(NDB)."
2005-06-30 12:12:03 [MgmSrvr] INFO -- Mgmt server state: node id's 1 3 5 7 not connected but reserved
after a PURGE STALE SESSIONS it works again.
2005-06-30 12:13:27 [MgmSrvr] INFO -- Mgmt server state: nodeid 7 freed, m_reserved_nodes 000000000000017e.
2005-06-30 12:13:27 [MgmSrvr] INFO -- Mgmt server state: nodeid 5 freed, m_reserved_nodes 000000000000015e.
2005-06-30 12:13:27 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000156.
2005-06-30 12:13:45 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 reserved for ip 192.168.1.5, m_reserved_nodes 000000000000015e.
Suggested fix:
free the reserved connections after they are declared dead.