Bug #21073 | CR: Slave dies and NDBD goes to 99% CPU using DBT2 | ||
---|---|---|---|
Submitted: | 14 Jul 2006 23:17 | Modified: | 30 Nov 2007 18:23 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Disk Data | Severity: | S2 (Serious) |
Version: | 5.1.12 | OS: | Linux (Linux) |
Assigned to: | Jonathan Miller | CPU Architecture: | Any |
[14 Jul 2006 23:17]
Jonathan Miller
[14 Jul 2006 23:48]
Jonathan Miller
I am not able to recover the slave at this point: Time: Saturday 15 July 2006 - 01:44:52 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbtup/DbtupDiskAlloc.cpp Error object: DBTUP (Line: 730) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 13277 Trace: /space/run/ndb_3_trace.log.3 Version: Version 5.1.12 (beta) ***EOM*** BLOCK ADDR ADDR ADDR ADDR ADDR ADDR ADDR ADDR ?004368 004377 DBLQH 006387 006464 002828 002837 002506 005090 005659 006587 DBLQH 002481 002485 002488 DBTC 004050 DBTUP 002032 DBLQH 003299 003506 016323 003617 DBACC 001054 003266 003330 003330 003330 003330 003330 003330 003330 003330 003266 003330 003330 003330 003330 003336 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979696 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743852 length: 4 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'00000075 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979695 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743852 length: 25 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'00000c5e H'00030004 H'00000002 H'00040004 H'00000ff0 H'00050004 H'00000004 H'00060004 H'44b70841 H'00070008 H'00000000 H'3ff00000 H'00080008 H'00000000 H'40440000 H'00090019 H'59435718 H'6b4a6676 H'664d4961 H'4e537757 H'4a624344 H'56714e78 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979694 gsn: 316 "LQHKEYREQ" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743852 length: 23 trace: 0 #sec: 0 fragInf: 0 ClientPtr = H'00000002 hashValue = H'76251550 tcBlockRef = H'00f70002 transId1 = H'00010000 transId2 = H'0f700200 savePointId = H'00000000 Op: 2 Lock: 1 Flags: Simple Dirty Rowid GCI ScanInfo/noFiredTriggers: H'0 AttrLen: 28 (5 in this) KeyLen: 4 TableId: 25 SchemaVer: 2 FragId: 0 ReplicaNo: 0 LastReplica: 0 NextNodeId: 65535 KeyInfo: H'00000c5e H'00000001 H'00000004 H'00000002 Rowid: [ page: 857 idx: 8030 ] GCI: 55857 AttrInfo: H'00020004 H'00000004 H'00010004 H'00000001 H'00000004 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979693 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743843 length: 20 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'000f0008 H'00000000 H'00000000 H'00100033 H'6f66351a H'4fe9e371 H'f76669f9 H'fc46e5ee H'32512fe9 H'38f76032 H'403948e7 H'694ee8f9 H'ed23dd57 H'6341443c H'25dd564a H'216d633b H'00fd75de --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979692 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743843 length: 25 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'000b0019 H'684c7318 H'62627558 H'74534b6e H'786b6259 H'455a4877 H'4d744a71 H'0000006b H'000c0019 H'5a556318 H'715a6a59 H'704f6856 H'5464504a H'4a527768 H'4f485068 H'0000006f H'000d0008 H'00000080 H'00000000 H'000e0008 H'00000000 H'00000000 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979691 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743843 length: 25 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'696b7859 H'6c5a504a H'70714b58 H'6c6e4947 H'6e506a78 H'0000006e H'00090019 H'4d4b5318 H'6b42664d H'6c4a496e H'68757253 H'62474771 H'46685677 H'00000069 H'000a0019 H'4a544418 H'4a4f6752 H'57564151 H'6a4b4761 H'68627670 H'57794b51 H'00000070 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 6979690 gsn: 4 "ATTRINFO" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 9743843 length: 25 trace: 0 #sec: 0 fragInf: 0 H'00000002 H'00010000 H'0f700200 H'4d4e4f49 H'6964586e H'4762754a H'00000072 H'00060019 H'4a637518 H'69654d4b H'42795a58 H'524f6c68 H'67617153 H'42744d64 H'00000070 H'00070019 H'464c6a18 H'4c694471 H'72575366 H'766f6577 H'50657064 H'55685645 H'00000056 H'00080019 H'68687818 ndb_mgm> Node 3: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error Node 2: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
[26 Oct 2007 5:48]
Jonas Oreland
lots of fixes made since "then", please retest
[30 Nov 2007 18:23]
Jonathan Miller
Hi, Spent time this week adding this testing to the ACRT framework. Currently running on latest telco clones with no issues. Closing the bug at this time. Best wishes, /Jeb