Description:
Over the weekend (Firday night CST), Nikolay and I stated a stress test using TPCB
scripts.
Sunday I introduced 2 new scripts that did the following:
Loop{
Log in to cluster;
Create Database;
Create Log Group;
Create Table Space;
Create Table;
Insert Data;
Delete Data;
Drop Table;
Drop Table Space;
Drop Log Group;
Drop Database;
}
All was working well. Monday I had found that we ran out of disk space and that one of the
data nodes had failed. The other was up but was spining on being out of undo space and
aborting any and all transaction. I was still able to connect to TPCB database and do
queries.
In an effort to recover, I moved some of the disk data to a different drive and created
symbolic links to them and restarted the data node.
The data node came up and never got past phase 4. In the ndb_1_cluster.log it showed that
the data node had completed phase 4, but a "3 status" in the managment console showed that
it was still in phase 4.
After leaving it in phase 4 for a couple of hours yesterday, I issues a "3 restart".
Checking it this morning I found that it was still in phase 4. Since both of the attemps
to restart had failed, I decided to "shutdown" and restart the entire cluster. On cluster
restart, the other data node crashed with the following error log:
Time: Tuesday 21 February 2006 - 13:41:18
Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please
report a bug)
Error: 2301
Error data: ArrayPool<T>::getPtr
Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 378 (block:
LGMAN)
Program: /home/ndbdev/ngrishakin/builds/libexec/ndbd
Pid: 16800
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.8 (beta)
***EOM***
Attached is a file from ndb_error_reporter, but due to bug in this script, FS is not
include.
How to repeat:
Not easy to repeat
Description: Over the weekend (Firday night CST), Nikolay and I stated a stress test using TPCB scripts. Sunday I introduced 2 new scripts that did the following: Loop{ Log in to cluster; Create Database; Create Log Group; Create Table Space; Create Table; Insert Data; Delete Data; Drop Table; Drop Table Space; Drop Log Group; Drop Database; } All was working well. Monday I had found that we ran out of disk space and that one of the data nodes had failed. The other was up but was spining on being out of undo space and aborting any and all transaction. I was still able to connect to TPCB database and do queries. In an effort to recover, I moved some of the disk data to a different drive and created symbolic links to them and restarted the data node. The data node came up and never got past phase 4. In the ndb_1_cluster.log it showed that the data node had completed phase 4, but a "3 status" in the managment console showed that it was still in phase 4. After leaving it in phase 4 for a couple of hours yesterday, I issues a "3 restart". Checking it this morning I found that it was still in phase 4. Since both of the attemps to restart had failed, I decided to "shutdown" and restart the entire cluster. On cluster restart, the other data node crashed with the following error log: Time: Tuesday 21 February 2006 - 13:41:18 Status: Temporary error, restart node Message: Assertion (Internal error, programming error or missing error message, please report a bug) Error: 2301 Error data: ArrayPool<T>::getPtr Error object: ../../../../../storage/ndb/src/kernel/vm/ArrayPool.hpp line: 378 (block: LGMAN) Program: /home/ndbdev/ngrishakin/builds/libexec/ndbd Pid: 16800 Trace: /space/run/ndb_2_trace.log.1 Version: Version 5.1.8 (beta) ***EOM*** Attached is a file from ndb_error_reporter, but due to bug in this script, FS is not include. How to repeat: Not easy to repeat