Description:
Hi all,
I have deployed a cluster in three servers, one hosts manager node, two hosts ndb data nodes and mysql nodes.
One day the two data nodes both crashed, the cluster can't works. I have check the error logs on both data nodes, they are below:
Time: Tuesday 24 August 2010 - 17:09:41
Status: Temporary error, restart node
Message: Node declared dead. See error log for details (Arbitration error)
Error: 2315
Error data: We(3) have been declared dead by 2 reason: Hearbeat failure(4)
Error object: QMGR (Line: 3555) 0x0000000a
Program: /usr/local//mysql/bin//ndbmtd
Pid: 11286 thr: 0
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /data/mysqlcluster//ndb_3_trace.log.2 /data/mysqlcluster//ndb_3_trace.log.2_t1 /data/mysqlcluster//ndb_3_trace.log.2_t
Time: Tuesday 24 August 2010 - 17:09:36
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5532) 0x0000000a
Program: /usr/local/mysql/bin//ndbmtd
Pid: 4882 thr: 0
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /data/mysqlcluster//ndb_2_trace.log.1 /data/mysqlclu
ndb_mgm -e show output is below:
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @192.168.12.111 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=3 @192.168.12.112 (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.12.110 (mysql-5.1.44 ndb-7.1.4)
[mysqld(API)] 4 node(s)
id=6 @192.168.12.111 (mysql-5.1.44 ndb-7.1.4)
id=7 @192.168.12.112 (mysql-5.1.44 ndb-7.1.4)
id=8 (not connected, accepting connect from 192.168.12.110)
id=9 (not connected, accepting connect from 192.168.12.110)
As I understand, the cluster is made up with only one Nodegroup with 2 data nodes, any data node lost connection, the other datanode will work lonely. The cluster is designed to work in the way, isn't it ? Any one data node lonely can form a partitioned cluster. Why one shutdown, the other shutdown followed?
Thanks in advance.
Best regards.
Sean Lee
How to repeat:
No idea