Description:
Hi guys,
We got an unexpected crash with the following error message.
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 4: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 3 sig->failNo =
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 4: Communication to Node 2 closed
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 4: Communication to Node 6 closed
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 3: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 3 sig->failNo =
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 3: Communication to Node 6 closed
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 5: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 3 sig->failNo =
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed
2010-07-01 21:18:22 [MgmSrvr] INFO -- Node 5: Communication to Node 6 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 5: Communication to Node 26 opened
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 3: Communication to Node 26 opened
2010-07-01 21:18:23 [MgmSrvr] ALERT -- Node 1: Node 4 Disconnected
2010-07-01 21:18:23 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected
2010-07-01 21:18:23 [MgmSrvr] ALERT -- Node 5: Node 4 Disconnected
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 3: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4 sig->failNo =
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 3: Communication to Node 6 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 5: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 4 sig->failNo =
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 5: Communication to Node 4 closed
2010-07-01 21:18:23 [MgmSrvr] INFO -- Node 5: Communication to Node 6 closed
2010-07-01 21:18:24 [MgmSrvr] INFO -- Node 5: Communication to Node 6 opened
2010-07-01 21:18:24 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected
2010-07-01 21:18:25 [MgmSrvr] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2010-07-01 21:18:25 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2010-07-01 21:18:25 [MgmSrvr] ALERT -- Node 1: Node 5 Disconnected
2010-07-01 21:18:26 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2010-07-01 21:18:27 [MgmSrvr] ALERT -- Node 5: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
What could the problem be? Thanks
-Hamid
How to repeat:
It doesn't seem to be reproducible. Upon attempting to restart the database cluster we consistently get:
2010-07-02 04:20:45 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 4. Caused by error 2306: 'Pointer too large(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
We ended up restoring from full backup.