Bug #18781 database drop during restart of failed data node cause data node failure
Submitted: 4 Apr 2006 20:33 Modified: 12 Jul 2006 12:31
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0 -> OS:Linux (Linux 32 Bit OS)
Assigned to: Pekka Nousiainen CPU Architecture:Any

[4 Apr 2006 20:33] Jonathan Miller
Description:
I restarted a failed data node. During phase 3 of the restart I dropped the dbt2 database and the node restarting failed:

Time: Tuesday 4 April 2006 - 22:21:55
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: restore.cpp
Error object: RESTORE (Line: 1149) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 18920
Trace: /space/run/ndb_3_trace.log.3
Version: Version 5.1.9 (beta)
***EOM***

trace:
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 3, r.sigId: 1073042 gsn: 315 "LQHKEYREF" prio: 1
s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 1073040 length: 5 trace: 2 #sec: 0 fragInf: 0
Signal data: H'00000000 H'00000000 H'000004ca H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 1073041 gsn: 264 "FSREADREQ" prio: 1
s.bn: 262 "RESTORE", s.proc: 3, s.sigId: 1073038 length: 7 trace: 2 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 149
 UserReference: H'01060003 Operation flag: H'00000023 (No sync, Format=List of global pages)
List of shared pages)
 varIndex: 197
 numberOfPages: 1
 pageData:  H'0000003c,
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 3, r.sigId: 1073040 gsn: 164 "CONTINUEB" prio: 1
s.bn: 262 "RESTORE", s.proc: 3, s.sigId: 1073036 length: 2 trace: 2 #sec: 0 fragInf: 0
 H'00000000 H'00000000
--------------- Signal ----------------
r.bn: 253 "NDBFS", r.proc: 3, r.sigId: 1073039 gsn: 164 "CONTINUEB" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 1073034 length: 1 trace: 0 #sec: 0 fragInf: 0
 Scanning the memory channel again with no delay
--------------- Signal ----------------
r.bn: 262 "RESTORE", r.proc: 3, r.sigId: 1073038 gsn: 262 "FSREADCONF" prio: 1
s.bn: 253 "NDBFS", s.proc: 3, s.sigId: 1073034 length: 2 trace: 2 #sec: 0 fragInf: 0
 UserPointer: 0
 FilePointer: 32768
--------------- Signal ----------------
r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 1073037 gsn: 199 "PREP_DROP_TAB_REQ" prio: 1
s.bn: 250 "DBDICT", s.proc: 2, s.sigId: 12411749 length: 4 trace: 0 #sec: 0 fragInf: 0
 senderRef: fa0002 senderData: 33 TableId: 15
--------------- Signal ----------------

How to repeat:
crash db node using instructions from 18780
start recovery on failed data node
drop dbt2 during phase 3
[18 Apr 2006 6:02] Jonas Oreland
create/drop/alter table during nodestartup is not supported.
Never has been.

However, it's quite easy to fix, that why I kept the bug report open
  for quite a while...havent decided if I should just document
  or fix
[18 Apr 2006 10:41] Jonathan Miller
Problem is that in a large cluster, an admin could be trying to recover a node while some one in a different office is adding or dropping. We need some type of fix here, even if it is to reject the add or drop while recovery happens.
[8 Jun 2006 14:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7400
[11 Jun 2006 18:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/7499
[22 Jun 2006 17:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8091
[29 Jun 2006 7:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8452
[8 Jul 2006 11:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8952
[9 Jul 2006 16:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8965
[10 Jul 2006 11:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8988
[10 Jul 2006 12:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/8991
[12 Jul 2006 12:31] Jon Stephens
Documented in 5.0.25/5.1.12 changelogs.

NOTE: This fix will now appear in 5.0.25 due to the cancellation of 5.0.23.