Description:
http://dev.mysql.com/tech-resources/articles/mysql-cluster-for-two-servers.html
I have a 2 server setup with the same config.ini on both the servers.
[ndbd default]
NoOfReplicas=2
DataMemory=80M
IndexMemory=18M
StopOnError=0
[tcp default]
portnumber=2202
[ndb_mgmd]
hostname=x.x.x.28
datadir=/var/lib/mysql-cluster
ArbitrationRank=0
[ndb_mgmd]
hostname=x.x.x.44
datadir=/var/lib/mysql-cluster
ArbitrationRank=0
[ndbd]
hostname=x.x.x.28
datadir=/var/lib/mysql/data
[ndbd]
hostname=x.x.x.44
datadir=/var/lib/mysql/data
[mysqld]
hostname=x.x.x.28
ArbitrationRank=0
[mysqld]
hostname=x.x.x.44
ArbitrationRank=0
[api]
hostname=x.x.x.28
ArbitrationRank=0
[api]
hostname=x.x.x.44
ArbitrationRank=0
my.cnf on the servers:
[mysqld]
ndbcluster
ndb-connectstring=x.x.x.28
[mysql_cluster]
ndb-connectstring=x.x.x.28
AND
[mysqld]
ndbcluster
ndb-connectstring=x.x.x.44
[mysql_cluster]
ndb-connectstring=x.x.x.44
So I have 2 mgm nodes, 2 data nodes and 2 sql nodes. When there is a loss of connectivity between the two servers, they run independently but when the connectivity is restored, one of the data nodes gets restarted.
the error.log shows:
Current byte-offset of file-pointer is: 568
Time: Monday 29 September 2008 - 15:57:18
Status: Temporary error, restart node
Message: Partitioned cluster detected. Please check if cluster is already running (Arbitration error)
Error: 2307
Error data: We(4) have been declared dead by 3 reason: Our cluster: 0000000000000010(8)
Error object: QMGR (Line: 3237) 0x0000000c
Program: ndbd
Pid: 22471
Trace: /var/lib/mysql/data/ndb_4_trace.log.1
Version: mysql-5.1.23 ndb-6.2.15
***EOM***
and the out.log shows:
2008-09-29 15:53:29 [ndbd] INFO -- Angel pid: 22470 ndb pid: 22471
2008-09-29 15:53:29 [ndbd] INFO -- NDB Cluster -- DB node 4
2008-09-29 15:53:29 [ndbd] INFO -- mysql-5.1.23 ndb-6.2.15 --
2008-09-29 15:53:29 [ndbd] INFO -- Configuration fetched at 172.19.30.44 port 1186
2008-09-29 15:53:29 [ndbd] INFO -- Start initiated (mysql-5.1.23 ndb-6.2.15)
2008-09-29 15:53:29 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 80Mb initial: 100Mb
Adding 100Mb to ZONE_LO (1,3199)
WOPool::init(61, 9)
RWPool::init(22, 13)
RWPool::init(42, 18)
RWPool::init(62, 13)
RWPool::init(c2, 18)
RWPool::init(e2, 15)
WOPool::init(41, 8)
RWPool::init(82, 12)
RWPool::init(a2, 51)
WOPool::init(21, 6)
m_active_buckets.set(1)
alloc_chunk(8 16) -
start_resend(0, empty bucket -> active
check StartPartialTimeout, node 3 thinks 3 is president, I think president is: 4
2008-09-29 15:57:18 [ndbd] INFO -- We(4) have been declared dead by 3 reason: Our cluster: 0000000000000010(8)
2008-09-29 15:57:18 [ndbd] INFO -- QMGR (Line: 3237) 0x0000000c
2008-09-29 15:57:18 [ndbd] INFO -- Error handler restarting system
2008-09-29 15:57:19 [ndbd] INFO -- Error handler shutdown completed - exiting
2008-09-29 15:57:19 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Caused by error 2307: 'Partitioned cluster detected. Please check if cluster is already running(Arbitration error). Temporary error, restart node'.
2008-09-29 15:57:19 [ndbd] INFO -- Ndb has terminated (pid 22471) restarting
2008-09-29 15:57:20 [ndbd] INFO -- Angel pid: 22470 ndb pid: 22667
2008-09-29 15:57:20 [ndbd] INFO -- NDB Cluster -- DB node 4
2008-09-29 15:57:20 [ndbd] INFO -- mysql-5.1.23 ndb-6.2.15 --
2008-09-29 15:57:20 [ndbd] INFO -- Configuration fetched at 172.19.30.44 port 1186
2008-09-29 15:57:20 [ndbd] INFO -- Start initiated (mysql-5.1.23 ndb-6.2.15)
2008-09-29 15:57:20 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 80Mb initial: 100Mb
Adding 100Mb to ZONE_LO (1,3199)
WOPool::init(61, 9)
RWPool::init(22, 13)
RWPool::init(42, 18)
RWPool::init(62, 13)
RWPool::init(c2, 18)
RWPool::init(e2, 15)
WOPool::init(41, 8)
RWPool::init(82, 12)
RWPool::init(a2, 51)
WOPool::init(21, 6)
RESTORE table: 0 1037 rows applied
RESTORE table: 0 1011 rows applied
RESTORE table: 1 0 rows applied
RESTORE table: 1 0 rows applied
2008-09-29 15:57:28 [ndbd] INFO -- NR: setLcpActiveStatusEnd - m_participatingLQH
alloc_chunk(27 16) -
114/0 (113/4294967295) switchover complete bucket 1 state: 1starting
How to repeat:
With the setup in the description, create a loss of connectivity by removing the network cable from one of the servers and after a few minutes restore the connectivity.
Suggested fix:
The data node should not get restarted.