Description:
When re-introdcuing a second node back into a two node cluster, the failure of the second node resulted in the first node crashing. I have extracted the log data from both the ndb log as well as the management node log.
2008-05-16 15:32:10 [MgmSrvr] INFO -- Node 2: Node 3 is WAIT_LCP including in LCP
2008-05-16 15:38:09 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected
2008-05-16 15:38:09 [MgmSrvr] ALERT -- Node 2: Node 3 Disconnected
2008-05-16 15:38:09 [MgmSrvr] INFO -- Node 2: Communication to Node 3 closed
2008-05-16 15:38:09 [MgmSrvr] ALERT -- Node 2: Network partitioning - arbitration required
2008-05-16 15:38:09 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=7]
2008-05-16 15:38:10 [MgmSrvr] ALERT -- Node 2: Arbitration won - positive reply from node 1
2008-05-16 15:38:10 [MgmSrvr] INFO -- Node 2: DICT: remove lock by failed node 3 for NodeRestart
2008-05-16 15:38:10 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue:
2008-05-16 15:38:11 [MgmSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=63430029f1f3b40a]
2008-05-16 15:39:15 [MgmSrvr] WARNING -- Node 2: Failure handling of node 3 has not completed in 1 min. - state = 3
2008-05-16 15:40:20 [MgmSrvr] WARNING -- Node 2: Failure handling of node 3 has not completed in 2 min. - state = 3
2008-05-16 15:41:26 [MgmSrvr] WARNING -- Node 2: Failure handling of node 3 has not completed in 3 min. - state = 3
2008-05-16 15:42:08 [MgmSrvr] INFO -- Node 2: Communication to Node 3 opened
2008-05-16 15:43:49 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected
2008-05-16 15:43:49 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed
2008-05-16 15:43:53 [MgmSrvr] INFO -- Node 2: Communication to Node 4 opened
2008-05-16 15:43:54 [MgmSrvr] INFO -- Mgmt server state: nodeid 4 reserved for ip 10.178.110.206, m_reserved_nodes 0000000000000012.
2008-05-16 15:43:54 [MgmSrvr] INFO -- Node 4: mysqld --server-id=0
2008-05-16 15:43:55 [MgmSrvr] INFO -- Node 2: Node 4 Connected
2008-05-16 15:43:55 [MgmSrvr] INFO -- Node 2: Node 4: API version 5.1.24
2008-05-16 15:44:00 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected
2008-05-16 15:44:03 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Please advise if you require any other information regarding the hardware, os or additional log information. For now i have only started the cluster with teh working data node.
Regards,
How to repeat:
I tried this on test and it worked fine, however i cannot replicate it again on out production setup as this will cause a service impact.