Bug #18780 | DD: DBT2 causes cluster failure using disk data in dbtup/DbtupDiskAlloc.cpp | ||
---|---|---|---|
Submitted: | 4 Apr 2006 19:47 | Modified: | 14 Jun 2006 0:39 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Disk Data | Severity: | S1 (Critical) |
Version: | 5.1.9 | OS: | Linux (Linux) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[4 Apr 2006 19:47]
Jonathan Miller
[4 Apr 2006 19:48]
Jonathan Miller
Ndb 2 error log: Time: Tuesday 4 April 2006 - 21:26:37 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbtup/DbtupDiskAlloc.cpp Error object: DBTUP (Line: 870) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 701 Trace: /space/run/ndb_2_trace.log.1 Version: Version 5.1.9 (beta) ***EOM*** ndb2 Trace: --------------- Signal ---------------- r.bn: 261 "PGMAN", r.proc: 2, r.sigId: 201308218 gsn: 164 "CONTINUEB" prio: 0 s.bn: 261 "PGMAN", s.proc: 2, s.sigId: 201308217 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000002 --------------- Signal ---------------- r.bn: 261 "PGMAN", r.proc: 2, r.sigId: 201308216 gsn: 164 "CONTINUEB" prio: 1 s.bn: 261 "PGMAN", s.proc: 2, s.sigId: 201308213 length: 1 trace: 1 #sec: 0 fragInf: 0 H'00000001 --------------- Signal ---------------- r.bn: 248 "DBACC", r.proc: 2, r.sigId: 201308215 gsn: 332 "NEXT_SCANREQ" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 201308212 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000003 H'0000757c H'00000002 --------------- Signal ---------------- r.bn: 248 "DBACC", r.proc: 2, r.sigId: 201308214 gsn: 332 "NEXT_SCANREQ" prio: 1 s.bn: 247 "DBLQH", s.proc: 2, s.sigId: 201308211 length: 3 trace: 1 #sec: 0 fragInf: 0 H'0000000f H'0000761d H'00000002 --------------- Signal ---------------- r.bn: 261 "PGMAN", r.proc: 2, r.sigId: 201308213 gsn: 164 "CONTINUEB" prio: 1 s.bn: 261 "PGMAN", s.proc: 2, s.sigId: 201308210 length: 1 trace: 1 #sec: 0 fragInf: 0 H'00000001 --------------- Signal ----------------
[4 Apr 2006 19:50]
Jonathan Miller
ndb 3 error log: Time: Tuesday 4 April 2006 - 21:28:22 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbtup/DbtupDiskAlloc.cpp Error object: DBTUP (Line: 870) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 708 Trace: /space/run/ndb_3_trace.log.1 Version: Version 5.1.9 (beta) ***EOM*** ndb3 trace log: --------------- Signal ---------------- r.bn: 261 "PGMAN", r.proc: 3, r.sigId: 187492690 gsn: 164 "CONTINUEB" prio: 0 s.bn: 261 "PGMAN", s.proc: 3, s.sigId: 187492689 length: 1 trace: 0 #sec: 0 fragInf: 0 H'00000002 --------------- Signal ---------------- r.bn: 245 "DBTC", r.proc: 3, r.sigId: 187492688 gsn: 351 "SCAN_FRAGCONF" prio: 1 s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 187492687 length: 6 trace: 1 #sec: 0 fragInf: 0 H'0000001f H'00000000 H'00000002 H'000000b5 H'00d00600 H'00000000 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 187492687 gsn: 389 "STORED_PROCCONF" prio: 1 s.bn: 249 "DBTUP", s.proc: 3, s.sigId: 187492686 length: 2 trace: 1 #sec: 0 fragInf: 0 H'000000b8 H'00000014 --------------- Signal ---------------- r.bn: 249 "DBTUP", r.proc: 3, r.sigId: 187492686 gsn: 391 "STORED_PROCREQ" prio: 1 s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 187492684 length: 5 trace: 1 #sec: 0 fragInf: 0 H'000000b8 H'0000000a H'00000007 H'00000003 H'00000014 --------------- Signal ---------------- r.bn: 245 "DBTC", r.proc: 3, r.sigId: 187492685 gsn: 351 "SCAN_FRAGCONF" prio: 1 s.bn: 247 "DBLQH", s.proc: 3, s.sigId: 187492683 length: 6 trace: 1 #sec: 0 fragInf: 0 H'00000019 H'00000001 H'00000000 H'000000b5 H'00d00600 H'00000074 --------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 3, r.sigId: 187492684 gsn: 330 "NEXT_SCANCONF" prio: 1 s.bn: 258 "DBTUX", s.proc: 3, s.sigId: 187492682 length: 3 trace: 1 #sec: 0 fragInf: 0 H'00000019 H'ffffff00 H'ffffff00 --------------- Signal ----------------
[4 Apr 2006 20:19]
Jonathan Miller
Note: Cluster Recovered from crash: 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 3: CM_REGCONF president = 2, own Node = 3, our dynamic id = 2 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 3: Node 2: API version 5.1.9 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 3: Start phase 1 completed 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 3: Start phase 2 completed (system restart) 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 2: Start phase 2 completed (system restart) 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 3: Start phase 3 completed (system restart) 2006-04-04 22:04:16 [MgmSrvr] INFO -- Node 2: Start phase 3 completed (system restart) 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 10 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 12 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 14 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 16 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 18 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 22 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 24 activated done 2006-04-04 22:04:26 [MgmSrvr] INFO -- Node 3: DICT: index 26 activated done 2006-04-04 22:05:54 [MgmSrvr] INFO -- Node 3: Applying undo to LCP: 55 2006-04-04 22:05:55 [MgmSrvr] INFO -- Node 3: Undo head - ./lg1/undofile.dat page: 2146 lsn: 402011 2006-04-04 22:05:56 [MgmSrvr] INFO -- Node 2: Applying undo to LCP: 55 2006-04-04 22:05:57 [MgmSrvr] INFO -- Node 2: Undo head - ./lg1/undofile.dat page: 2167 lsn: 401665 2006-04-04 22:06:01 [MgmSrvr] INFO -- Node 3: Logfile group: 5 2006-04-04 22:06:01 [MgmSrvr] INFO -- Node 3: head: ./lg1/undofile.dat page: 2146 2006-04-04 22:06:01 [MgmSrvr] INFO -- Node 3: tail: ./lg1/undofile.dat page: 2099 2006-04-04 22:06:01 [MgmSrvr] INFO -- Node 3: Flushing page cache after undo completion 2006-04-04 22:06:02 [MgmSrvr] INFO -- Node 3: Flushing complete 2006-04-04 22:06:03 [MgmSrvr] INFO -- Node 2: Logfile group: 5 2006-04-04 22:06:03 [MgmSrvr] INFO -- Node 2: head: ./lg1/undofile.dat page: 2167 2006-04-04 22:06:03 [MgmSrvr] INFO -- Node 2: tail: ./lg1/undofile.dat page: 2121 2006-04-04 22:06:03 [MgmSrvr] INFO -- Node 2: Flushing page cache after undo completion 2006-04-04 22:06:03 [MgmSrvr] INFO -- Node 2: Flushing complete 2006-04-04 22:06:10 [MgmSrvr] INFO -- Node 2: Start phase 4 completed (system restart) 2006-04-04 22:06:10 [MgmSrvr] INFO -- Node 3: Start phase 4 completed (system restart) 2006-04-04 22:06:11 [MgmSrvr] INFO -- Node 3: Local checkpoint 56 started. Keep GCI = 39617 oldest restorable GCI = 39603 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 2: Start phase 5 completed (system restart) 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 2: Start phase 6 completed (system restart) 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=1]2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 2: Start phase 7 completed (system restart) 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 3: Start phase 5 completed (system restart) 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 3: Start phase 6 completed (system restart) 2006-04-04 22:09:14 [MgmSrvr] INFO -- Node 3: Start phase 7 completed (system restart) 2006-04-04 22:09:25 [MgmSrvr] INFO -- Node 3: DICT: index 10 rebuild done 2006-04-04 22:09:27 [MgmSrvr] INFO -- Node 3: DICT: index 12 rebuild done 2006-04-04 22:10:18 [MgmSrvr] INFO -- Node 3: DICT: index 14 rebuild done 2006-04-04 22:10:22 [MgmSrvr] INFO -- Node 3: DICT: index 16 rebuild done 2006-04-04 22:10:23 [MgmSrvr] INFO -- Node 3: DICT: index 18 rebuild done 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: DICT: index 22 rebuild done 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: DICT: index 24 rebuild done 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: DICT: index 26 rebuild done 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 6 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 7 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 8 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 9 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 10 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 11 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 12 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 13 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 14 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Communication to Node 15 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Start phase 8 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Start phase 9 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 6 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 7 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 8 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 9 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 10 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 11 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 12 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 13 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 14 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Communication to Node 15 opened 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Start phase 8 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Start phase 9 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Start phase 100 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Start phase 101 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Started (version 5.1.9) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Start phase 100 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Start phase 101 completed (system restart) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Started (version 5.1.9) 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Node 6 Connected 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 3: Node 6: API version 5.1.9 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Node 6 Connected 2006-04-04 22:10:27 [MgmSrvr] INFO -- Node 2: Node 6: API version 5.1.9 2006-04-04 22:10:28 [MgmSrvr] INFO -- Node 2: Node 1: API version 5.1.9 2006-04-04 22:10:28 [MgmSrvr] INFO -- Node 3: Node 1: API version 5.1.9 2006-04-04 22:10:28 [MgmSrvr] INFO -- Node 3: Prepare arbitrator node 1 [ticket=482200016685f0ce] 2006-04-04 22:10:28 [MgmSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=482200016685f0ce] But the crash is repeatable: Time: Tuesday 4 April 2006 - 22:17:42 Status: Temporary error, restart node Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug) Error: 2341 Error data: dbtup/DbtupDiskAlloc.cpp Error object: DBTUP (Line: 870) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 18470 Trace: /space/run/ndb_3_trace.log.2 Version: Version 5.1.9 (beta) ***EOM***
[4 Apr 2006 21:20]
Jonas Oreland
Jeb, where can I find the test program?
[24 Apr 2006 7:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/5372
[12 Jun 2006 11:38]
Tomas Ulin
pushed to 5.1.12
[14 Jun 2006 0:39]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented bugfix in 5.1.12 changelog. Closed.