MySQL Bugs: #62694: crash

Bug #62694	crash
Submitted:	12 Oct 2011 7:49	Modified:	8 Oct 2016 6:47
Reporter:	Mr Jay	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.15a	OS:	Linux (x64)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Both 2 nodes in a 2 replica setup crashed at the same time

How to repeat:
Don't know

Time: Wednesday 12 October 2011 - 08:21:51
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: pgman.cpp
Error object: PGMAN (Line: 528) 0x00000002
Program: /usr/libexec/ndbmtd
Pid: 5613 thr: 2
Version: mysql-5.1.56 ndb-7.1.15a
Trace: /var/app/mysql/ndb/ndb_1_trace.log.3 /var/app/mysql/ndb/ndb_1_trace.log.3_t1 /var/app/mysql/ndb/ndb_1_trace.log.3_t2 /v

Last lines of ndb_1_trace.log.3_t1:
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746483 gsn: 264 "FSREADREQ" prio: 0
s.bn: 261/2 "PGMAN", s.proc: 1, s.sigId: -665243008 length: 7 trace: 1 #sec: 0 fragInf: 0
 UserPointer: 175
 FilePointer: 881
 UserReference: H'05050001 Operation flag: H'00000003 (No sync, Format=List of global pages)
List of shared pages)
 varIndex: 68423
 numberOfPages: 1
 pageData:  H'00000675, 
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746484 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 1, s.sigId: -632746486 length: 1 trace: 4 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746485 gsn: 264 "FSREADREQ" prio: 0
s.bn: 261/2 "PGMAN", s.proc: 1, s.sigId: -665243027 length: 7 trace: 1 #sec: 0 fragInf: 0
 UserPointer: 513
 FilePointer: 881
 UserReference: H'05050001 Operation flag: H'00000003 (No sync, Format=List of global pages)
List of shared pages)
 varIndex: 68162
 numberOfPages: 1
 pageData:  H'000009a9, 
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746486 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 1, s.sigId: -632746488 length: 1 trace: 4 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746487 gsn: 264 "FSREADREQ" prio: 0
s.bn: 261/2 "PGMAN", s.proc: 1, s.sigId: -665243037 length: 7 trace: 1 #sec: 0 fragInf: 0
 UserPointer: 168
 FilePointer: 881
 UserReference: H'05050001 Operation flag: H'00000003 (No sync, Format=List of global pages)
List of shared pages)
 varIndex: 68161
 numberOfPages: 1
 pageData:  H'00000816, 
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: -632746488 gsn: 568 "CONTINUE_FRAGMENTED" prio: 1
s.bn: 250 "DBDICT", s.proc: 1, s.sigId: -632746489 length: 2 trace: 4 #sec: 0 fragInf: 0
 H'00000000 H'00000d9a
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: -632746489 gsn: 164 "CONTINUEB" prio: 1
s.bn: 250 "DBDICT", s.proc: 1, s.sigId: -632746491 length: 4 trace: 4 #sec: 0 fragInf: 0
 H'00000000 H'0000010c H'00000002 H'00000004
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: -632746490 gsn: 24 "GET_TABINFOREQ" prio: 1
s.bn: 244/4 "BACKUP", s.proc: 1, s.sigId: 30174511 length: 5 trace: 4 #sec: 0 fragInf: 0
 senderRef: 0x8f40001 senderData: 0
 requestType: 0x2 RequestById LongSignalConf
 tableId: 268 schemaTransId: 0x0
--------------- Signal ----------------
r.bn: 250 "DBDICT", r.proc: 1, r.sigId: -632746491 gsn: 24 "GET_TABINFOREQ" prio: 1
s.bn: 244/3 "BACKUP", s.proc: 1, s.sigId: 888761885 length: 5 trace: 4 #sec: 0 fragInf: 0
 senderRef: 0x6f40001 senderData: 0
 requestType: 0x2 RequestById LongSignalConf
 tableId: 268 schemaTransId: 0x0
--------------- Signal ----------------
r.bn: 245 "DBTC", r.proc: 1, r.sigId: -632746492 gsn: 409 "TIME_SIGNAL" prio: 1
s.bn: 252 "QMGR", s.proc: 1, s.sigId: -632746494 length: 1 trace: 0 #sec: 0 fragInf: 0
 H'00000004
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746493 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 1, s.sigId: -632746497 length: 1 trace: 1 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 252 "QMGR", r.proc: 1, r.sigId: -632746494 gsn: 164 "CONTINUEB" prio: 0
s.bn: 252 "QMGR", s.proc: 1, s.sigId: -632746516 length: 3 trace: 0 #sec: 0 fragInf: 0
 H'00000004 H'00000000 H'47c6c7db
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 1, r.sigId: -632746495 gsn: 164 "CONTINUEB" prio: 0
s.bn: 253 "NDBFS", s.proc: 1, s.sigId: -632746517 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel every 10ms

Additionally, it is impossible to start the cluster again.
Both nodes have 100% cpuload and no disk IO.

Too much time passed, the logs are no longer available. From the description this looks like an issue we solved in 7.1.17 but I can't say for sure without full logs.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".