Bug #14531 cluster crash
Submitted: 1 Nov 2005 0:39 Modified: 1 Nov 2005 1:43
Reporter: Alex Slobodnik Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.14 OS:Linux (Red Hat EL3 Update 5)
Assigned to: CPU Architecture:Any

[1 Nov 2005 0:39] Alex Slobodnik
Description:
The cluster is not stable with large number of ndb storage nodes (8 nodes), all sharing the same replica of data (NoOfReplicas=1). When any of the storage nodes is killed, the cluster goes down with all storage nodes terminated.
The error message "Error handler shutting down system,Error handler shutdown completed - exiting" is printed to output log on all storage nodes.

The cluster works fine with 4 nodes.

Configuration and log files are listed below (except cluster log)

--------------------------------------------------------------------------------------
config.ini
--------------------------------------------------------------------------------------
[NDBD DEFAULT]
NoOfReplicas=1
DataMemory=250M
IndexMemory=250M

[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]

# Managment Server
[NDB_MGMD]
Id=1
HostName=10.130.50.31           # the IP of THIS SERVER
# Storage Engines

[NDBD]
Id=2
HostName=host2
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=3
HostName=host3
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=4
HostName=host4
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=5
HostName=host5
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=6
HostName=host6
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=7
HostName=host7
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=8
HostName=host8
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[NDBD]
Id=9
HostName=host9
DataDir=/var/lib/mysql/cluster
ServerPort=2203

[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]
[MYSQLD]

--------------------------------------------------------------------------------------
/var/lib/mysql/cluster/ndb_2_error.log
--------------------------------------------------------------------------------------
Date/Time: Monday 31 October 2005 - 16:03:42
Type of error: error
Message: Arbitrator shutdown
Fault ID: 2305
Problem data: Arbitrator decided to shutdown this node
Object of reference: QMGR (Line: 3795) 0x0000000a
ProgramName: /usr/sbin/ndbd
ProcessID: 4397
TraceFile: /var/lib/mysql/cluster/ndb_2_trace.log.10
Version 4.1.14
***EOM***

--------------------------------------------------------------------------------------
/var/lib/mysql/cluster/ndb_2_out.log
--------------------------------------------------------------------------------------
2005-10-31 15:57:30 [NDB] INFO     -- Angel pid: 4049 ndb pid: 4050
2005-10-31 15:57:30 [NDB] INFO     -- NDB Cluster -- DB node 2
2005-10-31 15:57:30 [NDB] INFO     -- Version 4.1.14 --
2005-10-31 15:57:30 [NDB] INFO     -- Configuration fetched at 10.133.55.51 port 1186
2005-10-31 16:01:51 [NDB] INFO     -- Angel pid: 4396 ndb pid: 4397
2005-10-31 16:01:51 [NDB] INFO     -- NDB Cluster -- DB node 2
2005-10-31 16:01:51 [NDB] INFO     -- Version 4.1.14 --
2005-10-31 16:01:51 [NDB] INFO     -- Configuration fetched at 10.133.55.51 port 1186
Error handler shutting down system
Error handler shutdown completed - exiting

How to repeat:
Very easy to reproduce:
-- configure cluster with 8 nodes, set NoOfReplicas=1
-- start the cluster
-- kill ndbd on one of the nodes
[1 Nov 2005 0:40] Alex Slobodnik
cluster log

Attachment: cluster.log (application/octet-stream, text), 13.48 KiB.

[1 Nov 2005 1:43] Alex Slobodnik
My mistake, the behaviour is correct. I confused NoOfReplicas=8 with NoOfReplicas=1