Description:
We have a 6 node 3 replicas cluster.
I removed the network cable on data node 6 and then the whole cluster went down.
This looks related to bug 21213 (Caused by error 2341)
Mgmt-client output:
ndb_mgm> all status
Node 1: started (Version 5.0.22)
Node 2: started (Version 5.0.22)
Node 3: started (Version 5.0.22)
Node 4: started (Version 5.0.22)
Node 5: started (Version 5.0.22)
Node 6: started (Version 5.0.22)
ndb_mgm> Node 4: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed
ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 1: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 5: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Cluster log:
Jul 25 13:59:15 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Local checkpoint 5146 started. Keep GCI = 50314 oldest restorable GCI = 50324
Jul 25 14:00:24 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 2
Jul 25 14:00:26 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 3
Jul 25 14:00:26 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:00:35 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:00:59 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Local checkpoint 5147 started. Keep GCI = 50324 oldest restorable GCI = 50334
Jul 25 14:02:43 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 2
Jul 25 14:02:44 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:02:44 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 3
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 4
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 declared dead due to missed heartbeat
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 2: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 4: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Node 6 Disconnected
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Arbitration check won - node group majority
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: President restarts arbitration thread [state=6]
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 4 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 4: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 1 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 3 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 2 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 5 Connected
Jul 25 14:02:48 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:49 nl2-db4 NDB[10219]: [MgmSrvr] Mgmt server state: nodeid 24 freed, m_reserved_nodes 0008000008600080.
Node 4 error log:
Time: Tuesday 25 July 2006 - 14:02:46
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, pl
ease report a bug)
Error: 2341
Error data:
Error object: DBTC (Line: 6278) 0x0000000e
Program: /opt/mysqlcluster/libexec/ndbd
Pid: 13687
Trace: /var/db/6-nodes/log/ndb_4_trace.log.4
Version: Version 5.0.22
***EOM***
How to repeat:
Remove the network cable on one of the data nodes.