Bug #21272 Network failure causes MySQL cluster to shut down
Submitted: 25 Jul 2006 12:40 Modified: 14 Aug 2006 14:06
Reporter: Lars Torstensson Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:MySQL cluster 5.0.22 OS:Linux (Linux 2.6.9-34.ELsmp #1 )
Assigned to: Assigned Account CPU Architecture:Any

[25 Jul 2006 12:40] Lars Torstensson
Description:
We have a 6 node 3 replicas cluster.
I removed the network cable on data node 6 and then the whole cluster went down.

This looks related to bug 21213 (Caused by error 2341)

Mgmt-client output:
ndb_mgm> all status
Node 1: started (Version 5.0.22)
Node 2: started (Version 5.0.22)
Node 3: started (Version 5.0.22)
Node 4: started (Version 5.0.22)
Node 5: started (Version 5.0.22)
Node 6: started (Version 5.0.22)

ndb_mgm> Node 4: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed 
ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 1: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 5: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequir
e)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Cluster log:

Jul 25 13:59:15 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Local checkpoint 5146 started. Keep GCI = 50314 oldest restorable GCI = 50324
Jul 25 14:00:24 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 2
Jul 25 14:00:26 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 3
Jul 25 14:00:26 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:00:35 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:00:59 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Local checkpoint 5147 started. Keep GCI = 50324 oldest restorable GCI = 50334
Jul 25 14:02:43 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 2
Jul 25 14:02:44 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 6 Connected
Jul 25 14:02:44 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 3
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 missed heartbeat 4
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Node 6 declared dead due to missed heartbeat
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 2: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 4: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Node 6 Disconnected
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Communication to Node 6 closed
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Arbitration check won - node group majority
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: President restarts arbitration thread [state=6]
Jul 25 14:02:46 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 4 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 4: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 1 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 3 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 1: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 2 Connected
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:47 nl2-db4 NDB[10219]: [MgmSrvr] Node 7: Node 5 Connected
Jul 25 14:02:48 nl2-db4 NDB[10219]: [MgmSrvr] Node 5: Forced node shutdown completed. Initiated by signal 6. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Jul 25 14:02:49 nl2-db4 NDB[10219]: [MgmSrvr] Mgmt server state: nodeid 24 freed, m_reserved_nodes 0008000008600080.

Node 4 error log:
Time: Tuesday 25 July 2006 - 14:02:46
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, pl
ease report a bug)
Error: 2341
Error data: 
Error object: DBTC (Line: 6278) 0x0000000e
Program: /opt/mysqlcluster/libexec/ndbd
Pid: 13687
Trace: /var/db/6-nodes/log/ndb_4_trace.log.4
Version: Version 5.0.22
***EOM***

How to repeat:
Remove the network cable on one of the data nodes.
[25 Jul 2006 12:41] Lars Torstensson
Category update
[14 Aug 2006 14:06] Jonas Oreland
Hi,

This was actually a duplicate of http://bugs.mysql.com/bug.php?id=20185
which I fixed in 5.0.23 :-)

/Jonas