Description:
I run sysbench-0.4.8 against cluster to test disk data feature in oltp mode, but this caused ndbd crash.
Builded one cluster on three linux boxes:
one mgmd: 172.16.74.66
one mysqld: 172.16.74.62
two ndbd: 172.16.74.62 and 172.16.74.116
After preparing sbtest table, run sysbench :
[root@qa3-62 mysql]# /usr/local/sysbench048/bin/sysbench --test=oltp --mysql-table-engine=ndbcluster --oltp-table-size=1000000 --max-requests=50 --num-threads=1 --mysql-user=root run
......
Using "BEGIN" for starting transactions
Using auto_inc on the id column
Maximum number of requests for OLTP test is limited to 50
Threads started!
ALERT: failed to execute mysql_stmt_execute(): Err1297 Got temporary error 4031 'Node failure caused abort of transaction' from NDBCLUSTER
FATAL: database error, exiting...
Done.
The ndb_mgm console printed:
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @172.16.74.62 (Version: 5.1.22, Nodegroup: 0, Master)
id=3 @172.16.74.116 (Version: 5.1.22, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @172.16.74.66 (Version: 5.1.22)
[mysqld(API)] 3 node(s)
id=5 @172.16.74.62 (Version: 5.1.22)
id=6 (not connected, accepting connect from 172.16.74.116)
id=7 (not connected, accepting connect from any host)
ndb_mgm> Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
When I tried to start node2 with ndbd command, the ndb_mgm console printed:
ndb_mgm> show
... ...
ndb_mgm> Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
Node 2 error log:
=================
Time: Wednesday 2 January 2008 - 15:32:26
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupDiskAlloc.cpp
Error object: DBTUP (Line: 946) 0x0000000e
Program: bin/ndbd
Pid: 6031
Trace: /usr/local/mysql/clu_data/ndb_2_trace.log.1
Version: Version 5.1.22 (rc)
***EOM***
Time: Wednesday 2 January 2008 - 15:42:11
Status: Temporary error, restart node
Message: Another node failed during system restart, please investigate error(s) on other node(s) (Restart error)
Error: 2308
Error data: Node 3 disconnected
Error object: QMGR (Line: 2648) 0x0000000a
Program: bin/ndbd
Pid: 6174
Trace: /usr/local/mysql/clu_data/ndb_2_trace.log.2
Version: Version 5.1.22 (rc)
***EOM***
Node 3 error log:
================
Time: Wednesday 2 January 2008 - 15:45:40
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbtup/DbtupDiskAlloc.cpp
Error object: DBTUP (Line: 946) 0x0000000e
Program: bin/ndbd
Pid: 22198
Trace: /usr/local/mysql/clu_data/ndb_3_trace.log.1
Version: Version 5.1.22 (rc)
***EOM***
cluster log:
============
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: LCP Take over completed (state = 4)
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: ParticipatingDIH = 0000000000000000
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: ParticipatingLQH = 0000000000000000
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_DIH = [SignalCounter: m_count=0 0000000000000000]
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_Counter_LQH = [SignalCounter: m_count=0 0000000000000000]
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: m_LAST_LCP_FRAG_ORD = [SignalCounter: m_count=0 0000000000000000]
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: m_LCP_COMPLETE_REP_From_Master_Received = 1
2008-01-02 15:32:32 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2008-01-02 15:32:32 [MgmSrvr] INFO -- Node 3: Started arbitrator node 1 [ticket=56b60002396d926c]
2008-01-02 15:32:35 [MgmSrvr] INFO -- Node 3: Communication to Node 2 opened
2008-01-02 15:41:48 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 reserved for ip 172.16.74.62, m_reserved_nodes 0000000000000026.
2008-01-02 15:41:48 [MgmSrvr] INFO -- Node 1: Node 2 Connected
2008-01-02 15:41:49 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 0000000000000022.
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: Node 3 Connected
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: CM_REGCONF president = 3, own Node = 2, our dynamic id = 3
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 3: Node 2 Connected
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 3: Node 2: API version 5.1.22
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: Node 3: API version 5.1.22
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: Start phase 1 completed
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: Start phase 2 completed (node restart)
2008-01-02 15:41:49 [MgmSrvr] INFO -- Node 2: Receive arbitrator node 1 [ticket=56b60002396d926c]
2008-01-02 15:41:50 [MgmSrvr] INFO -- Node 2: Start phase 3 completed (node restart)
2008-01-02 15:41:50 [MgmSrvr] INFO -- Node 2: Start phase 4 completed (node restart)
2008-01-02 15:41:50 [MgmSrvr] INFO -- Node 3: DICT: locked by node 2 for NodeRestart
2008-01-02 15:41:50 [MgmSrvr] INFO -- Node 3: DICT: lock bs: 4 ops: 0 poll: 0 cnt: 0 queue: 2L
2008-01-02 15:41:51 [MgmSrvr] INFO -- Node 2: DICT: index 13 activated
2008-01-02 15:41:51 [MgmSrvr] INFO -- Node 2: DICT: index 14 activated
2008-01-02 15:41:51 [MgmSrvr] INFO -- Node 3: Node: 2 StartLog: [GCI Keep: 395 LastCompleted: 645 NewestRestorable: 918]
2008-01-02 15:42:05 [MgmSrvr] INFO -- Node 2: Applying undo to LCP: 29
2008-01-02 15:42:05 [MgmSrvr] INFO -- Node 2: Undo head - undo_1.dat page: 2626 lsn: 1000459
2008-01-02 15:42:07 [MgmSrvr] INFO -- Node 2: Logfile group: 8
2008-01-02 15:42:07 [MgmSrvr] INFO -- Node 2: head: undo_1.dat page: 2626
2008-01-02 15:42:07 [MgmSrvr] INFO -- Node 2: tail: undo_1.dat page: 2590
2008-01-02 15:42:07 [MgmSrvr] INFO -- Node 2: Flushing page cache after undo completion
2008-01-02 15:42:07 [MgmSrvr] INFO -- Node 2: Flushing complete
2008-01-02 15:42:08 [MgmSrvr] INFO -- Node 3: Local checkpoint 30 started. Keep GCI = 409 oldest restorable GCI = 410
2008-01-02 15:42:15 [MgmSrvr] ALERT -- Node 1: Node 3 Disconnected
2008-01-02 15:42:16 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2008-01-02 15:42:16 [MgmSrvr] ALERT -- Node 1: Node 2 Disconnected
2008-01-02 15:42:16 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
How to repeat:
1.prepare 'sbtest' table for sysbench test:
/usr/local/sysbench048/bin/sysbench --test=oltp --mysql-table-engine=ndbcluster --oltp-table-size=1000000 --mysql-user=root prepare
2.change sbtest table to disk data:
mysql> create logfile group lg_1 add undofile 'undo_1.dat' initial_size 200M undo_buffer_size 10M engine ndb;
Query OK, 0 rows affected (5.43 sec)
mysql> create tablespace ts_1 add datafile 'data_1.dat' use logfile group lg_1 initial_size 700M engine ndb;
Query OK, 0 rows affected (18.67 sec)
mysql> alter table sbtest tablespace ts_1 storage disk engine ndb;
Query OK, 1000000 rows affected (9 min 31.35 sec)
Records: 1000000 Duplicates: 0 Warnings: 0
3.run sysbench test scripts:
[root@qa3-62 mysql]# /usr/local/sysbench048/bin/sysbench --test=oltp --mysql-table-engine=ndbcluster --oltp-table-size=1000000 --max-requests=50 --num-threads=1 --mysql-user=root run