Description:
ERROR 1297 (HY000) at line 1: Got temporary error 1501 'Out of undo space'
from NDBCLUSTER kills a datanode.
I'm working on about 230 MyISAM tables, currently holding archive data so no changes are made to those. I'm trying to get those tables in NDBCLUSTER, using tablespaces for disk clustering so there data won't eatup all my Gigs of RAM.
After a while ALTER TABLE'ing ERROR 1297 comes along followed by the next error messages @ ndb_mgm and the datanode it self;
Time: Monday 26 February 2007 - 15:27:25
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error,
programming error or missing error message, please report a bug
)
Error: 2341
Error data: lgman.cpp
Error object: LGMAN (Line: 1778) 0x0000000a
Program: /usr/sbin/ndbd
Pid: 16462
Trace: /usr/local/mysql/data/ndb_14_trace.log.3
Version: Version 5.1.14 (beta)
***EOM***
lgman (logmanager?) @ 1778:
01775 undo[2] |= File_formats::Undofile::UNDO_NEXT_LSN << 16;
01776 Uint32 *dst= get_log_buffer(ptr, sizeof(undo) >> 2);
01777 memcpy(dst, undo, sizeof(undo));
01778 ndbrequire(ptr.p->m_free_file_words >= (sizeof(undo) >> 2));
01779 ptr.p->m_free_file_words -= (sizeof(undo) >> 2);
2007-02-26 16:30:28 [MgmSrvr] INFO -- Node 12: Local checkpoint 1553
started. Keep GCI = 2171832 oldest restorable GCI = 2171947
2007-02-26 16:36:18 [MgmSrvr] INFO -- Node 12: Local checkpoint 1554
started. Keep GCI = 2173457 oldest restorable GCI = 2173556
2007-02-26 16:42:10 [MgmSrvr] INFO -- Node 12: Local checkpoint 1555
started. Keep GCI = 2173622 oldest restorable GCI = 2173749
2007-02-26 16:55:05 [MgmSrvr] INFO -- Node 12: Local checkpoint 1556
started. Keep GCI = 2173804 oldest restorable GCI = 2173911
2007-02-26 16:57:15 [MgmSrvr] ALERT -- Node 11: Node 14 Disconnected
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 11: Communication to Node 14
closed
2007-02-26 16:57:15 [MgmSrvr] ALERT -- Node 12: Node 14 Disconnected
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 12: Communication to Node 14
closed
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 12: Communication to Node 14
closed
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 1: Node 14 Connected
2007-02-26 16:57:15 [MgmSrvr] ALERT -- Node 12: Arbitration check won -
node group majority
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 12: President restarts
arbitration thread [state=6]
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 12: DICT: lock bs: 3 ops: 1
poll: 0 cnt: 0 queue:
2007-02-26 16:57:15 [MgmSrvr] ALERT -- Node 13: Node 14 Disconnected
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 13: Communication to Node 14
closed
2007-02-26 16:57:15 [MgmSrvr] INFO -- Node 13: Communication to Node 14
closed
2007-02-26 16:57:16 [MgmSrvr] ALERT -- Node 14: Forced node shutdown
completed. Initiated by signal 0. Caused by error 2341: 'Int
ernal program error (failed ndbrequire)(Internal error, programming error or
missing error message, please report a bug). Temporary
error, restart node'.
2007-02-26 16:58:14 [MgmSrvr] WARNING -- Node 11: Failure handling of node
14 has not completed in 1 min. - state = 3
2007-02-26 16:58:14 [MgmSrvr] WARNING -- Node 12: Failure handling of node
14 has not completed in 1 min. - state = 3
2007-02-26 16:58:14 [MgmSrvr] WARNING -- Node 13: Failure handling of node
14 has not completed in 1 min. - state = 3
How to repeat:
CREATE LOGFILE GROUP a_loggroup
ADD UNDOFILE './loggroups/a_undo.dat'
INITIAL_SIZE 10M
ENGINE NDBCLUSTER;
CREATE TABLESPACE a_archive_01
ADD DATAFILE './tablespaces/a_archive_01.dat'
USE LOGFILE GROUP affiliates_loggroup
INITIAL_SIZE 12M
ENGINE NDBCLUSTER;
Couple of times:
ALTER TABLE x TABLESPACE a_archive_01 STORAGE DISK, ENGINE NDBCLUSTER;
etc.
...
Suggested fix:
Tried using a bigger undo_buffer_size for the LOGFILE GROUP, didn't work...