MySQL Bugs: #56815: Arbitration error with no reason.

Bug #56815	Arbitration error with no reason.
Submitted:	16 Sep 2010 11:22	Modified:	2 May 2012 12:04
Reporter:	Sean Lee	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.1.44-ndb-7.1.4b-cluster-gpl	OS:	Linux (ubuntu 8.04 64bit 2.6.24-24-server edition)
Assigned to:		CPU Architecture:	Any
Tags:	arbitration error, Lost connection

Description:
Hi all,

I have deployed a cluster in three servers, one hosts manager node, two hosts ndb data nodes and mysql nodes.

One day the two data nodes both crashed, the cluster can't works. I have check the error logs on both data nodes, they are below:

Time: Tuesday 24 August 2010 - 17:09:41
Status: Temporary error, restart node
Message: Node declared dead. See error log for details (Arbitration error)
Error: 2315
Error data: We(3) have been declared dead by 2 reason: Hearbeat failure(4)
Error object: QMGR (Line: 3555) 0x0000000a
Program: /usr/local//mysql/bin//ndbmtd
Pid: 11286 thr: 0
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /data/mysqlcluster//ndb_3_trace.log.2 /data/mysqlcluster//ndb_3_trace.log.2_t1 /data/mysqlcluster//ndb_3_trace.log.2_t

Time: Tuesday 24 August 2010 - 17:09:36
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 5532) 0x0000000a
Program: /usr/local/mysql/bin//ndbmtd
Pid: 4882 thr: 0
Version: mysql-5.1.44 ndb-7.1.4b
Trace: /data/mysqlcluster//ndb_2_trace.log.1 /data/mysqlclu

ndb_mgm -e show output is below:

Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2	@192.168.12.111  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=3	@192.168.12.112  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@192.168.12.110  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)]	4 node(s)
id=6	@192.168.12.111  (mysql-5.1.44 ndb-7.1.4)
id=7	@192.168.12.112  (mysql-5.1.44 ndb-7.1.4)
id=8 (not connected, accepting connect from 192.168.12.110)
id=9 (not connected, accepting connect from 192.168.12.110)

As I understand, the cluster is made up with only one Nodegroup with 2 data nodes, any data node lost connection, the other datanode will work lonely. The cluster is designed to work in the way, isn't it ? Any one data node lonely can form a partitioned cluster. Why one shutdown, the other shutdown followed?

Thanks in advance.

Best regards.

Sean Lee

How to repeat:
No idea

Looks most probably like network issues. One node died of heartbeat failure and if the other one can't get in touch with the management node it's toasted. No logs attached so closing as !bug.