Description:
ERROR 1297 (HY000) at line 1883: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
and
ERROR 1297 (HY000) at line 1879: Got temporary error 4031 'Node failure caused abort of transaction' from NDBCLUSTER
We have a data node pair (replica 2).
restoring the database from a mysqldump causes a node failure.
The master node is generally uneffected.
Sometimes both nodes crash.
It generally occurs in the same section of the dump file but not always.
If one data node is down the restore is successful.
This leads me to believe it is a problem with the disk table code or
the replication code.
cluster log
2007-04-18 18:19:31 [MgmSrvr] INFO -- Node 3: Local checkpoint 7 started. Keep GCI = 743 oldest restorable GCI = 747
2007-04-18 18:19:56 [MgmSrvr] INFO -- Node 5: Event buffer status: used=56MB(100%) alloc=56MB(0%) max=0B apply_gci=783 latest_gci=783
2007-04-18 18:20:28 [MgmSrvr] INFO -- Node 4: Data usage is 7%(2300 32K pages of total 32768)
2007-04-18 18:20:28 [MgmSrvr] INFO -- Node 4: Index usage is 7%(795 8K pages of total 10272)
2007-04-18 18:20:29 [MgmSrvr] INFO -- Node 3: Index usage is 7%(795 8K pages of total 10272)
2007-04-18 18:20:49 [MgmSrvr] INFO -- Node 3: Local checkpoint 8 started. Keep GCI = 772 oldest restorable GCI = 787
2007-04-18 18:22:28 [MgmSrvr] INFO -- Node 4: Data usage is 9%(3213 32K pages of total 32768)
2007-04-18 18:22:28 [MgmSrvr] INFO -- Node 4: Index usage is 9%(941 8K pages of total 10272)
2007-04-18 18:22:29 [MgmSrvr] INFO -- Node 3: Index usage is 9%(941 8K pages of total 10272)
2007-04-18 18:22:30 [MgmSrvr] ALERT -- Node 2: Node 3 Disconnected
2007-04-18 18:22:30 [MgmSrvr] ALERT -- Node 4: Node 3 Disconnected
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: Communication to Node 3 closed
2007-04-18 18:22:30 [MgmSrvr] ALERT -- Node 4: Network partitioning - arbitration required
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: President restarts arbitration thread [state=7]
2007-04-18 18:22:30 [MgmSrvr] ALERT -- Node 4: Arbitration won - positive reply from node 2
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: GCP Take over started
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue:
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: LCP Take over started
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: ParticipatingDIH = 0000000000000010
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: ParticipatingLQH = 0000000000000010
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 0
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: GCP Take over completed
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: LCP Take over completed (state = 6)
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: ParticipatingDIH = 0000000000000010
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: ParticipatingLQH = 0000000000000010
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=1 0000000000000010]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2007-04-18 18:22:30 [MgmSrvr] INFO -- Node 4: m_LCP_COMPLETE_REP_From_Master_Received = 0
2007-04-18 18:22:31 [MgmSrvr] INFO -- Node 4: Started arbitrator node 1 [ticket=601500020881ba92]
2007-04-18 18:22:46 [MgmSrvr] INFO -- Node 4: Communication to Node 3 opened
2007-04-18 18:24:29 [MgmSrvr] INFO -- Node 4: Data usage is 9%(3273 32K pages of total 32768)
2007-04-18 18:24:29 [MgmSrvr] INFO -- Node 4: Index usage is 9%(941 8K pages of total 10272)
node error log:
Time: Wednesday 18 April 2007 - 18:22:30
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 6 received; Aborted
Error object: main.cpp
Program: ndbd
Pid: 5183
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.1
Version: Version 5.1.17 (beta)
***EOM***
last portion of node output log:
allocated 32 pages: [ m_page_no: 3714 m_file_no: 3 m_page_idx: 1480]
allocated 32 pages: [ m_page_no: 3746 m_file_no: 3 m_page_idx: 1481]
allocated 32 pages: [ m_page_no: 3778 m_file_no: 3 m_page_idx: 1482]
allocated 32 pages: [ m_page_no: 3810 m_file_no: 3 m_page_idx: 1483]
allocated 32 pages: [ m_page_no: 3842 m_file_no: 3 m_page_idx: 1484]
allocated 32 pages: [ m_page_no: 3874 m_file_no: 3 m_page_idx: 1485]
2007-04-18 18:22:30 [ndbd] INFO -- Received signal 6. Running error handler.
2007-04-18 18:22:30 [ndbd] INFO -- Signal 6 received; Aborted
2007-04-18 18:22:30 [ndbd] INFO -- main.cpp
2007-04-18 18:22:30 [ndbd] INFO -- Error handler signal shutting down system
How to repeat:
config.ini
[TCP DEFAULT]
SendBufferMemory= 8M
[NDBD DEFAULT]
NoOfReplicas= 2
DataDir= /var/lib/mysql-cluster
DataMemory= 1024M
IndexMemory= 80M
MaxNoofTables=400
MaxNoofAttributes= 4000
MaxNoOfConcurrentOperations= 166000
MaxNoOfOrderedIndexes=1000
MaxNoOfUniqueHashIndexes=1000
MemReportFrequency= 120
SharedGlobalMemory= 96M
NoOfFragmentLogFiles=32
RedoBuffer= 32M
MaxNoOfOpenFiles=60
[NDB_MGMD]
Id=1
Hostname= 64.72.127.150
DataDir= /var/lib/mysql-cluster
ArbitrationRank= 1
[NDB_MGMD]
id=2
Hostname= 64.72.127.183
DataDir= /var/lib/mysql-cluster
ArbitrationRank= 2
[NDBD]
Id=3
HostName= 64.72.127.183
[NDBD]
Id=4
HostName= 64.72.127.152
[MYSQLD]
Id=5
HostName= 64.72.127.183
ArbitrationRank= 0
[MYSQLD]
id=6
HostName= 64.72.127.184
ArbitrationRank= 0
[MYSQLD]
id=7
HostName= 64.72.127.152
ArbitrationRank= 0
[MYSQLD]
Id=8
HostName= 64.72.127.150
ArbitrationRank= 0
disktable setup
CREATE LOGFILE GROUP lg_1
ADD UNDOFILE 'undo_1.dat'
INITIAL_SIZE 256M
UNDO_BUFFER_SIZE 64M
ENGINE NDB;
ALTER LOGFILE GROUP lg_1
ADD UNDOFILE 'undo_2.dat'
INITIAL_SIZE 256M
ENGINE NDB;
CREATE TABLESPACE ts_1
ADD DATAFILE 'data_1.dat'
USE LOGFILE GROUP lg_1
INITIAL_SIZE 512M
ENGINE NDB;
ALTER TABLESPACE ts_1
ADD DATAFILE 'data_2.dat'
INITIAL_SIZE 512M
ENGINE NDB;
ALTER TABLESPACE ts_1
ADD DATAFILE 'data_3.dat'
INITIAL_SIZE 512M
ENGINE NDB;
ALTER TABLESPACE ts_1
ADD DATAFILE 'data_4.dat'
INITIAL_SIZE 512M
ENGINE NDB;
create database [database];
then from command line execute a data restore of the old database.
mysql -u root [database] < test2.sql