Bug #16739 DD: Data node can not recover from a full disk data file
Submitted: 24 Jan 2006 1:50 Modified: 24 Jan 2006 11:34
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.6-alpha OS:Linux (Linux)
Assigned to: Jonas Oreland CPU Architecture:Any

[24 Jan 2006 1:50] Jonathan Miller
Description:
Trying to restart data node after failure in bug #16738 fails with the following:

Time: Tuesday 24 January 2006 - 01:46:13
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 1373) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 9626
Trace: /space/run/ndb_4_trace.log.2
Version: Version 5.1.6 (alpha)

--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010493 gsn: 4 "ATTRINFO" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1010490 length: 25 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00010000 H'0f700400 H'00000080 H'00030050 H'6e696f47 H'69662067
 H'6e696873 H'20202067 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020
 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020
 H'20202020 H'20202020 H'20202020 H'20202020
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010492 gsn: 316 "LQHKEYREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1010490 length: 20 trace: 2 #sec: 0 fragInf: 0
 ClientPtr = H'00000000 hashValue = H'eeb97203 tcBlockRef = H'00f70004
 transId1 = H'00010000 transId2 = H'0f700400 savePointId = H'00000000
 Op: 2 Lock: 1 Flags: Simple Dirty Rowid GCI ScanInfo/noFiredTriggers: H'0
 AttrLen: 27 (5 in this) KeyLen: 1 TableId: 9 SchemaVer: 1
 FragId: 0 ReplicaNo: 0 LastReplica: 0 NextNodeId: 65535
 KeyInfo: H'000089ca
 Rowid: [ page: 15 idx: 1708 ]
 GCI: 9543 AttrInfo: H'00000004 H'000089ca H'00010004 H'00000161 H'00020004
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010491 gsn: 67 "ACC_ABORTCONF" prio: 1
s.bn: 248 "DBACC", s.proc: 4, s.sigId: 1010489 length: 1 trace: 2 #sec: 0 fragInf: 0
 H'00000002
--------------- Signal ----------------
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010490 gsn: 164 "CONTINUEB" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1010487 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000006 H'00000000
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010489 gsn: 4 "ATTRINFO" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1010486 length: 25 trace: 2 #sec: 0 fragInf: 0
 H'00000001 H'00010001 H'0f700400 H'00000080 H'00030050 H'6e696f47 H'69662067
 H'6e696873 H'20202067 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020
 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020 H'20202020
 H'20202020 H'20202020 H'20202020 H'20202020
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1010488 gsn: 316 "LQHKEYREQ" prio: 1
s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1010486 length: 20 trace: 2 #sec: 0 fragInf: 0
 ClientPtr = H'00000001 hashValue = H'112d003d tcBlockRef = H'00f70004
 transId1 = H'00010001 transId2 = H'0f700400 savePointId = H'00000000
 Op: 2 Lock: 1 Flags: Simple Dirty Rowid GCI ScanInfo/noFiredTriggers: H'0
 AttrLen: 27 (5 in this) KeyLen: 1 TableId: 9 SchemaVer: 1
 FragId: 1 ReplicaNo: 0 LastReplica: 0 NextNodeId: 65535
 KeyInfo: H'000089c6
 Rowid: [ page: 15 idx: 525 ]
 GCI: 9543 AttrInfo: H'00000004 H'000089c6 H'00010004 H'00000161 H'00020004
--------------- Signal ----------------

How to repeat:
repeat steps in 16738. Once data node has aborted, thry to restart.
[24 Jan 2006 7:10] Jonas Oreland
This is a duplicate of bug #16738.
It write corrupt data to disk before crashing.
[24 Jan 2006 11:34] Jonathan Miller
This is not duplicate. Maybe a side effect, but not a duplicate.