MySQL Bugs: #20261: NDB node wont start even with --intial option

Bug #20261	NDB node wont start even with --intial option
Submitted:	4 Jun 2006 23:12	Modified:	4 Aug 2006 9:19
Reporter:	Jerome Macaranas	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.21-1.rhel4	OS:	Linux (RHEL 4)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
SETUP:
    testpc1 = management node and sql node 2
    testpc2 = data node 1
    testpc3 = data node 2
    testpc4 = sql node 1

=== management out
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.168.6.201  (Version: 5.0.21, starting, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 192.168.6.202)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.6.200  (Version: 5.0.21)

[mysqld(API)]   2 node(s)
id=4 (not connected, accepting connect from 192.168.6.40)
id=5 (not connected, accepting connect from any host)

initially setup was working fine.. until the day that the machine's power-strip was accidentally pulled out.

i was able to start id=2... after a successful start.. i started id=3 but a error occurs...

===> 2006-06-05 06:49:44 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2310: 'Error while reading the REDO log(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.

==== Error Log ===
Time: Wednesday 31 May 2006 - 14:20:38
Status: Ndbd file system error, restart node initial
Message: Error while reading the REDO log (Ndbd file system inconsistency error, please report a bug)
Error: 2310
Error data: Error while reading REDO log. from 14900
D=9, F=0 Mb=0 FP=1 W1=35 W2=0
Error object: DBLQH (Line: 14936) 0x0000000a
Program: ndbd
Pid: 2979
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.8
Version: Version 5.0.21
===============

i also tried starting it with --initial..
===> 2006-06-05 06:59:37 [MgmSrvr] ALERT    -- Node 3: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug)

==== Error Log ===
Time: Thursday 1 June 2006 - 13:38:24
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Node 2 killed this node because it could not copy a fragment during node restart. Copy fragment err
Error object: NDBCNTR (Line: 197) 0x0000000e
Program: ndbd
Pid: 3570
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.9
Version: Version 5.0.21
=====

I updated the ndbd storage to ndb-storage-5.0.22-0... executed the last 2 steps.. but with no luck..

w/o "--initial"

==== Error Log ===
Time: Monday 5 June 2006 - 06:56:49
Status: Ndbd file system error, restart node initial
Message: Error while reading the REDO log (Ndbd file system inconsistency error, please report a bug)
Error: 2310
Error data: Error while reading REDO log. from 14900
D=8, F=0 Mb=0 FP=1 W1=35 W2=0
Error object: DBLQH (Line: 14936) 0x0000000a
Program: ndbd
Pid: 4579
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.12
Version: Version 5.0.22
==============

w/ "--intial"

==== Error Log ===
Time: Monday 5 June 2006 - 07:06:41
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Node 2 killed this node because it could not copy a fragment during node restart. Copy fragment err
Error object: NDBCNTR (Line: 197) 0x0000000e
Program: ndbd
Pid: 4614
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.13
Version: Version 5.0.22
=============

How to repeat:
i dont have the slightest idea what happened and how to repeat the same situation... i can redo the from scratch but I doubt that it may occur again..

Trace logs

Attachment: ndb_3_trace_logs.tar.bz2 (application/x-bzip2, text), 151.06 KiB.

Changed category to a more appropriate one.

The initial node restart seems to fail due to out of memory.

Duplicate of http://bugs.mysql.com/bug.php?id=18475

Solution is to
1) change memory allocation strategy (likly in 5.1/5.2)
2) Improve error message on COPY_FRAGREF