Description:
We have a 4 server cluster as follows:
[ndbd(NDB)] 4 node(s)
id=11 @10.0.0.211 (Version: 5.1.11, Nodegroup: 0, Master)
id=12 @10.0.0.212 (Version: 5.1.11, Nodegroup: 0)
id=13 @10.0.0.213 (Version: 5.1.11, Nodegroup: 1)
id=14 @10.0.0.214 (Version: 5.1.11, Nodegroup: 1)
[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.0.0.211 (Version: 5.1.11)
id=2 @10.0.0.212 (Version: 5.1.11)
[mysqld(API)] 8 node(s)
id=21 @10.0.0.211 (Version: 5.1.11)
id=22 @10.0.0.212 (Version: 5.1.11)
id=23 @10.0.0.213 (Version: 5.1.11)
id=24 @10.0.0.214 (Version: 5.1.11)
id=31 (not connected, accepting connect from 10.0.0.211)
id=32 (not connected, accepting connect from 10.0.0.212)
id=33 (not connected, accepting connect from 10.0.0.213)
id=34 (not connected, accepting connect from 10.0.0.214)
About an hour ago, ndbd node 2 and ndn_mgmd node 12 (both on the same
physical server) started missing their heartbeats, and so both were
declared dead by the main management node. Node 2 was able to reconnect
after a few seconds, but node 12 shutdown and stayed down.
Looking in the error log it is suggesting I should report a bug:
[root@sql2]# tail /var/lib/mysql-cluster/ndb_12_error.log -n 12
Time: Wednesday 21 February 2007 - 11:01:44
Status: Unknown
Message: No message slogan found (please report a bug if you get this
error code) (Unknown)
Error: 0
Error data: We(12) have been declared dead by 13 reason: Hearbeat failure(4)
Error object: QMGR (Line: 2843) 0x0000000a
Program: ndbd
Pid: 11380
Trace: /var/lib/mysql-cluster/ndb_12_trace.log.13
Version: Version 5.1.11 (beta)
***EOM***
How to repeat:
Possibly occurred during heavy load on the cluster.