Bug #12006 ndb_restore has issues recovering from Temporary error: 4025: Node failure
Submitted: 18 Jul 2005 13:47 Modified: 22 Aug 2005 6:27
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1 OS:Linux (Linux)
Assigned to: Jonas Oreland CPU Architecture:Any

[18 Jul 2005 13:47] Jonathan Miller
Description:
The cluster had been running backups all weekend in a loop. after stopping this test I decided to try and restore the rather large backup numer of 8866. I went into the cluster and drop and recreated that databases and started the restore process. 

On the last data node (5) restore, the system started getting "Temporary error: 4025: Node failure caused abort of transaction". Looking at the ndb cluster log you can see that node 4 missed some heart beats during that time.

This may have just been a temporay resource issue, as I have not been able to recreate. BUt decide to bug report for documentation.

2005-07-18 14:27:11 [MgmSrvr] INFO     -- Node 3: Local checkpoint 566 started. Keep GCI = 242075 oldest restorable GCI = 242080
2005-07-18 14:27:32 [MgmSrvr] WARNING  -- Node 3: Node 4 missed heartbeat 2
2005-07-18 14:27:34 [MgmSrvr] WARNING  -- Node 3: Node 4 missed heartbeat 3
2005-07-18 14:27:34 [MgmSrvr] ALERT    -- Node 4: Node 6 Disconnected
2005-07-18 14:27:34 [MgmSrvr] ALERT    -- Node 4: Node 7 Disconnected
2005-07-18 14:27:34 [MgmSrvr] INFO     -- Node 4: Communication to Node 6 closed
2005-07-18 14:27:34 [MgmSrvr] INFO     -- Node 4: Communication to Node 7 closed
2005-07-18 14:27:38 [MgmSrvr] INFO     -- Node 4: Communication to Node 6 opened
2005-07-18 14:27:38 [MgmSrvr] INFO     -- Node 4: Communication to Node 7 opened
2005-07-18 14:27:39 [MgmSrvr] INFO     -- Node 4: Node 7 Connected
2005-07-18 14:27:39 [MgmSrvr] INFO     -- Node 4: Node 7: API version 5.1.0
2005-07-18 14:27:41 [MgmSrvr] INFO     -- Node 4: Node 6 Connected
2005-07-18 14:27:41 [MgmSrvr] INFO     -- Node 4: Node 6: API version 5.1.0
2005-07-18 14:28:24 [MgmSrvr] INFO     -- Node 3: Local checkpoint 567 started. Keep GCI = 242096 oldest restorable GCI = 242102

@ndb10 BACKUP]$  ndb_restore -c $ndbc -n 2 -e -m -b 8866 -r ./BACKUP-8866/
Ndb version in backup files: Version 5.1.0
Connected to ndb!!
Successfully restored table BANK/def/GL
Successfully restored table event REPL$BANK/GL
Successfully restored table BANK/def/ACCOUNT
Successfully restored table event REPL$BANK/ACCOUNT
Successfully restored table BANK/def/TRANSACTION
Successfully restored table event REPL$BANK/TRANSACTION
Successfully restored table BANK/def/SYSTEM_VALUES
Successfully restored table event REPL$BANK/SYSTEM_VALUES
Successfully restored table BANK/def/ACCOUNT_TYPES
Successfully restored table event REPL$BANK/ACCOUNT_TYPES
Successfully restored table BANK2/def/GL
Successfully restored table event REPL$BANK2/GL
Successfully restored table BANK2/def/ACCOUNT
Successfully restored table event REPL$BANK2/ACCOUNT
Successfully restored table BANK2/def/TRANSACTION
Successfully restored table event REPL$BANK2/TRANSACTION
Successfully restored table BANK2/def/SYSTEM_VALUES
Successfully restored table event REPL$BANK2/SYSTEM_VALUES
Successfully restored table BANK2/def/ACCOUNT_TYPES
Successfully restored table event REPL$BANK2/ACCOUNT_TYPES
Successfully restored table atae/def/dcacache
Successfully restored table event REPL$atae/dcacache
Successfully restored table BANK/def/ACCOUNT_TYPE
Successfully restored table BANK2/def/ACCOUNT_TYPE
Successfully created index PRIMARY on dcacache
_____________________________________________________
Restoring data in table: BANK/def/GL(16) fragment 0
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT(15) fragment 0
_____________________________________________________
Restoring data in table: BANK/def/TRANSACTION(14) fragment 0
_____________________________________________________
Restoring data in table: BANK/def/SYSTEM_VALUES(13) fragment 0
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPES(12) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/GL(11) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT(10) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/TRANSACTION(9) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/SYSTEM_VALUES(8) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPES(7) fragment 0
_____________________________________________________
Restoring data in table: atae/def/dcacache(6) fragment 0
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPE(5) fragment 0
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPE(4) fragment 0
_____________________________________________________
Restoring data in table: cluster_replication/def/apply_status(2) fragment 0
_____________________________________________________
Restoring data in table: sys/def/NDB$EVENTS_0(1) fragment 0
_____________________________________________________
Restoring data in table: sys/def/SYSTAB_0(0) fragment 0
Restored 1669391 tuples and 0 log entries
@ndb10 BACKUP]$  ndb_restore -c $ndbc -n 3 -e -b 8866 -r ./BACKUP-8866/
Ndb version in backup files: Version 5.1.0
Connected to ndb!!
_____________________________________________________
Restoring data in table: BANK/def/GL(16) fragment 2
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT(15) fragment 2
_____________________________________________________
Restoring data in table: BANK/def/TRANSACTION(14) fragment 2
_____________________________________________________
Restoring data in table: BANK/def/SYSTEM_VALUES(13) fragment 2
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPES(12) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/GL(11) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT(10) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/TRANSACTION(9) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/SYSTEM_VALUES(8) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPES(7) fragment 2
_____________________________________________________
Restoring data in table: atae/def/dcacache(6) fragment 2
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPE(5) fragment 2
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPE(4) fragment 2
_____________________________________________________
Restoring data in table: cluster_replication/def/apply_status(2) fragment 2
_____________________________________________________
Restoring data in table: sys/def/NDB$EVENTS_0(1) fragment 2
_____________________________________________________
Restoring data in table: sys/def/SYSTAB_0(0) fragment 2
Restored 1668845 tuples and 0 log entries
ndb10 BACKUP]$  ndb_restore -c $ndbc -n 4 -e -b 8866 -r ./BACKUP-8866/
Ndb version in backup files: Version 5.1.0
Connected to ndb!!
_____________________________________________________
Restoring data in table: BANK/def/GL(16) fragment 1
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT(15) fragment 1
_____________________________________________________
Restoring data in table: BANK/def/TRANSACTION(14) fragment 1
_____________________________________________________
Restoring data in table: BANK/def/SYSTEM_VALUES(13) fragment 1
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPES(12) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/GL(11) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT(10) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/TRANSACTION(9) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/SYSTEM_VALUES(8) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPES(7) fragment 1
_____________________________________________________
Restoring data in table: atae/def/dcacache(6) fragment 1
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPE(5) fragment 1
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT_TYPE(4) fragment 1
_____________________________________________________
Restoring data in table: cluster_replication/def/apply_status(2) fragment 1
_____________________________________________________
Restoring data in table: sys/def/NDB$EVENTS_0(1) fragment 1
_____________________________________________________
Restoring data in table: sys/def/SYSTAB_0(0) fragment 1
Restored 1665355 tuples and 0 log entries
db10 BACKUP]$  ndb_restore -c $ndbc -n 5 -e -b 8866 -r ./BACKUP-8866/
Ndb version in backup files: Version 5.1.0
Connected to ndb!!
_____________________________________________________
Restoring data in table: BANK/def/GL(16) fragment 3
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT(15) fragment 3
_____________________________________________________
Restoring data in table: BANK/def/TRANSACTION(14) fragment 3
_____________________________________________________
Restoring data in table: BANK/def/SYSTEM_VALUES(13) fragment 3
_____________________________________________________
Restoring data in table: BANK/def/ACCOUNT_TYPES(12) fragment 3
_____________________________________________________
Restoring data in table: BANK2/def/GL(11) fragment 3
_____________________________________________________
Restoring data in table: BANK2/def/ACCOUNT(10) fragment 3
_____________________________________________________
Restoring data in table: BANK2/def/TRANSACTION(9) fragment 3
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction
Temporary error: 4025: Node failure caused abort of transaction

How to repeat:
have been unable to repeat.

Suggested fix:
None
[22 Jul 2005 13:05] Jonas Oreland
Hi,

Do you know why a node failed.
Are there any cores/error/trace files left?

Basically ndb_restore will retry a finite number of times and then quit.
[25 Jul 2005 13:02] Jonathan Miller
Jonas, there where no core or trace files associated with this issue. I have not seen this since, but wanted to document that it had happened. Is it possible to add more error information for when ndb_restore gets into this situation. Maybe which node failed?
[22 Aug 2005 6:27] Jonas Oreland
Hi Jeb,

1) API cant currently tell which node has been disconnected in a easy way...

My guess is that ndb_restore has been shutdown due to missed heartbeats
This is fixed by Stewarts "load independent heartbeats".

Therefore I'm closing this bug...