Bug #37880 after mgm_node 1 is restarted backup is not available from mgm_node2
Submitted: 4 Jul 2008 15:57 Modified: 20 Nov 2008 7:40
Reporter: Bogdan Kecman Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1.24 ndb-6.3.14 OS:Any
Assigned to: CPU Architecture:Any
Tags: Backup, cluster

[4 Jul 2008 15:57] Bogdan Kecman
Description:
simple setup
 1- mgm_node
 2- mgm_node
 3- data node
 4- data node
 5- sql node

if backup is started from node1 it always go ok.
if node1 is restarted, first attempt to get backup on node2 fails
all next attempts go trough ok.

How to repeat:
setup cluster
 1- mgm_node
 2- mgm_node
 3- data node
 4- data node
 5- sql node

reset node1
start backup on node2 - it fails

Suggested fix:
n/a
[14 Aug 2008 11:02] Bogdan Kecman
duplication test:

1. start backup on node1
2. on node2

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=3	@192.168.122.217  (mysql-5.1.24 ndb-6.3.14, Nodegroup: 0, Master)
id=4	@192.168.122.20  (mysql-5.1.24 ndb-6.3.14, Nodegroup: 0)

[ndb_mgmd(MGM)]	2 node(s)
id=1	@192.168.122.13  (mysql-5.1.24 ndb-6.3.14)
id=2	@192.168.122.125  (mysql-5.1.24 ndb-6.3.14)

[mysqld(API)]	2 node(s)
id=5	@192.168.122.13  (mysql-5.1.24 ndb-6.3.14)
id=6	@192.168.122.125  (mysql-5.1.24 ndb-6.3.14)

ndb_mgm> Node 3: Backup 10 started from node 1
Node 3: Backup 10 started from node 1 completed
 StartGCP: 693 StopGCP: 696
 #Records: 4488 #LogRecords: 0
 Data: 99512 bytes Log: 0 bytes
1 restart
start backup
Node 1 is being restarted

ndb_mgm> start backup
Waiting for completed, this may take several minutes
Backup failed
*  3001: Could not start backup
*        No contact with database nodes: Permanent error: Application error
ndb_mgm> 

- config.ini:
[root@D2 MMM]# cat config.ini
[TCP DEFAULT]

[API DEFAULT]

[NDB_MGMD DEFAULT]
DataDir=/MMM/log1

[NDBD DEFAULT]
DataDir=/MMM/klaster
NoOfReplicas: 2
StopOnError: Y
DataMemory: 512M
IndexMemory: 128M

[NDB_MGMD]
id=1
hostname=192.168.122.13
ArbitrationRank: 1
LogDestination=FILE:filename=ndb_1_cluster.log,maxsize=10000000,maxfiles=5

[NDB_MGMD]
id=2
hostname=192.168.122.125
ArbitrationRank: 1
LogDestination=FILE:filename=ndb_2_cluster.log,maxsize=10000000,maxfiles=5

[NDBD]
id=3
hostname=192.168.122.217
FileSystemPath: /MMM/dnode

[NDBD]
id=4
hostname=192.168.122.20
FileSystemPath: /MMM/dnode

[MYSQLD]

[MYSQLD]
[14 Aug 2008 11:05] Bogdan Kecman
node2 log:

2008-08-14 13:01:40 [MgmSrvr] INFO     -- Node 3: Backup 10 started from node 1
2008-08-14 13:01:40 [MgmSrvr] INFO     -- Node 3: Backup 10 started from node 1 completed. StartGCP: 693 StopGCP: 696 #Records: 4488 #LogRecords: 0 Data: 99512 bytes Log: 0 bytes
2008-08-14 13:02:02 [MgmSrvr] WARNING  -- Node 3: Node 2 missed heartbeat 2
2008-08-14 13:02:02 [MgmSrvr] WARNING  -- Node 3: Node 2 missed heartbeat 3
2008-08-14 13:02:02 [MgmSrvr] WARNING  -- Node 4: Node 2 missed heartbeat 2
2008-08-14 13:02:02 [MgmSrvr] WARNING  -- Node 4: Node 2 missed heartbeat 3
2008-08-14 13:02:02 [MgmSrvr] ALERT    -- Node 2: Node 3 Disconnected
2008-08-14 13:02:02 [MgmSrvr] ALERT    -- Node 2: Node 4 Disconnected
2008-08-14 13:02:12 [MgmSrvr] INFO     -- Mgmt server state: nodeid 1 freed, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000040.
2008-08-14 13:02:12 [MgmSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 192.168.122.13, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000042.
2008-08-14 13:02:58 [MgmSrvr] INFO     -- Node 2: Node 4 Connected
2008-08-14 13:02:58 [MgmSrvr] INFO     -- Node 2: Node 3 Connected
2008-08-14 13:02:58 [MgmSrvr] INFO     -- Node 3: Started arbitrator node 1 [ticket=05b5000ac0e14d31]
[20 Nov 2008 7:40] Bernd Ocklin
Not able to reproduce and test case missing.