MySQL Bugs: #21151: Forced node shutdown caused by error 2305

Bug #21151	Forced node shutdown caused by error 2305
Submitted:	19 Jul 2006 14:00	Modified:	28 Aug 2006 8:24
Reporter:	Eugene Gorelik	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.22	OS:	Linux (Linux RHEL ES 4)
Assigned to:		CPU Architecture:	Any

Description:
We are running MySQL cluster 5.0.22 on 64-bit RHEL OS with 2 CPU's

Our cluster consists of 2 data nodes and 1 management node.
Periodically ou data nodes crashing without any obvious reason with following errors:

Data node error log:

Time: Tuesday 11 July 2006 - 21:17:33
Status: Temporary error, restart node
Message: Arbitrator shutdown, please investigate error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 4556) 0x0000000a
Program: /opt/mysql/bin/ndbd
Pid: 2888
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.2
Version: Version 5.0.22
***EOM***

Data node output log:

2006-07-11 21:17:33 [ndbd] INFO     -- Error handler shutting down system
2006-07-11 21:17:34 [ndbd] INFO     -- Error handler shutdown completed - exiting
2006-07-11 21:17:34 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Arbitrator shutd
own, please investigate error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

This issue occurs on both data nodes at exactly same time.
This is a brand new cluster and it's not being used in production yet, so this issue can't be caused by a performance hit.
  

How to repeat:
Unknown

Changing to Cluster Category.

Trace log

Attachment: ndb_3_trace.log.2.gz (application/x-gzip, text), 38.83 KiB.

Hi,
I'm using RHEL4 with mysql-max-5.0.22-linux-i686-icc-glibc23.tar.gz.

I'd setup 1 mgmt node (node A), 2 NDB node (node B and C), and 1 mysql server (node D).

They running, seems not much problem.

I run "/usr/local/mysql/bin/ndbd -d" on both of my NDB nodes. I can see two(2) copies of "ndbd -d" processes running on each of them.

When I do a stress test by unplug the network cable of node B. After few seconds, once of the "ndbd -d" process got killed by itself (the other copy still running). When I plug back the network cable, I can see below message on my mgmt node - ndb_mgm console:

2006-07-23 01:18:37 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

When this error appeared, the other "ndbd -d" process on node B will get killed automatically too! This cause my node B unable connect back to the clustering setup forever until I manually issue "ndbd -d" again.

I read through alot of bugfix, it doesn't seems to be fixed till version 5.0.22. Highly appreciate if someone able to rectify the problem. Thanks!!

regards,
rachel

> When this error appeared, the other "ndbd -d" process on 
> node B will get killed automatically too! This cause my
> node B unable connect back to the clustering
> setup forever until I manually issue "ndbd -d" again.

See http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-ndbd-definition.html#id3146375

  * StopOnError

  This parameter specifies whether an ndbd process should exit 
  or perform an automatic restart when an error condition is encountered.

  This feature is enabled by default.

Yes, by disable that function, it works!
But, does this affect the performance?

As both nodes shut down at exactly the same time we'll need the full set of logs of both data nodes and the management node to analyze this.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".