Bug #19179 | Stress: Loading 3 databases at ome time cause cluster to crash | ||
---|---|---|---|
Submitted: | 18 Apr 2006 21:38 | Modified: | 21 Apr 2006 4:46 |
Reporter: | Jonathan Miller | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | 5.1.10 | OS: | Linux (Linux 32 Bit OS) |
Assigned to: | CPU Architecture: | Any |
[18 Apr 2006 21:38]
Jonathan Miller
[18 Apr 2006 21:41]
Jonathan Miller
(0%) max=0B apply_gci=647 latest_gci=1405 2006-04-18 23:24:21 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:21 [MgmSrvr] WARNING -- Node 5: Transporter to node 6 reported error 0x16 2006-04-18 23:24:21 [MgmSrvr] WARNING -- Node 5: Transporter to node 6 reported error 0x16 - Repeated 2 times 2006-04-18 23:24:22 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:24 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 22 times 2006-04-18 23:24:25 [MgmSrvr] INFO -- Node 8: Event buffer status: used=42KB(100%) alloc=42KB(0%) max=0B apply_gci=647 latest_gci=1406 2006-04-18 23:24:25 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:28 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 18 times 2006-04-18 23:24:31 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 19 times 2006-04-18 23:24:33 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 13 times 2006-04-18 23:24:33 [MgmSrvr] INFO -- Node 2: Local checkpoint 31 started. Keep GCI = 1358 oldest restorable GCI = 1360 2006-04-18 23:24:33 [MgmSrvr] INFO -- Node 8: Event buffer status: used=42KB(100%) alloc=42KB(0%) max=0B apply_gci=647 latest_gci=1407 2006-04-18 23:24:33 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:34 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 11 times 2006-04-18 23:24:35 [MgmSrvr] INFO -- Node 7: Event buffer status: used=33MB(100%) alloc=33MB(0%) max=0B apply_gci=1402 latest_gci=1403 2006-04-18 23:24:35 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:35 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 - Repeated 5 times 2006-04-18 23:24:35 [MgmSrvr] INFO -- Node 6: Event buffer status: used=45MB(7%) alloc=45MB(0%) max=0B apply_gci=1401 latest_gci=1404 2006-04-18 23:24:36 [MgmSrvr] WARNING -- Node 4: Transporter to node 7 reported error 0x16 2006-04-18 23:24:36 [MgmSrvr] INFO -- Node 6: Event buffer status: used=45MB(8%) alloc=45MB..... . . . .
[18 Apr 2006 21:42]
Jonathan Miller
2006-04-18 23:26:40 [MgmSrvr] WARNING -- Node 3: Node 7 missed heartbeat 4 2006-04-18 23:26:40 [MgmSrvr] ALERT -- Node 3: Node 7 declared dead due to missed heartbeat 2006-04-18 23:26:40 [MgmSrvr] INFO -- Node 3: Communication to Node 7 closed 2006-04-18 23:26:40 [MgmSrvr] ALERT -- Node 3: Node 7 Disconnected 2006-04-18 23:26:41 [MgmSrvr] INFO -- Node 1: Node 2 Connected 2006-04-18 23:26:41 [MgmSrvr] ALERT -- Node 3: Node 2 Disconnected 2006-04-18 23:26:41 [MgmSrvr] INFO -- Node 3: Communication to Node 2 closed 2006-04-18 23:26:41 [MgmSrvr] ALERT -- Node 4: Node 2 Disconnected 2006-04-18 23:26:41 [MgmSrvr] INFO -- Node 4: Communication to Node 2 closed 2006-04-18 23:26:41 [MgmSrvr] ALERT -- Node 5: Node 2 Disconnected 2006-04-18 23:26:41 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed 2006-04-18 23:26:42 [MgmSrvr] ALERT -- Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart no 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 1: Node 3 Connected 2006-04-18 23:26:42 [MgmSrvr] ALERT -- Node 4: Node 3 Disconnected 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 4: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 4: Communication to Node 2 closed 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 4: Communication to Node 3 closed 2006-04-18 23:26:42 [MgmSrvr] ALERT -- Node 5: Node 3 Disconnected 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 5: Communication to Node 3 closed 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 5: Possible bug in Dbdih::execBLOCK_COMMIT_ORD c_blockCommit = 1 c_blockCommitNo = 2 sig->failNo = 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 5: Communication to Node 2 closed 2006-04-18 23:26:42 [MgmSrvr] INFO -- Node 5: Communication to Node 3 closed 2006-04-18 23:26:42 [MgmSrvr] ALERT -- Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart no 2006-04-18 23:26:43 [MgmSrvr] INFO -- Node 1: Node 4 Connected 2006-04-18 23:26:43 [MgmSrvr] ALERT -- Node 4: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitration error). Temporary error, restart node'. 2006-04-18 23:26:44 [MgmSrvr] INFO -- Node 1: Node 5 Connected 2006-04-18 23:26:44 [MgmSrvr] INFO -- Mgmt server state: nodeid 9 freed, m_reserved_nodes 00000000000001c2. 2006-04-18 23:26:44 [MgmSrvr] ALERT -- Node 5: Forced node shutdown completed. Initiated by signal 0. Caused by error 2305: 'Arbitrator shutdown, please investigate error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
[18 Apr 2006 21:46]
Jonathan Miller
Trying to recover: Node 4: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'. Node 2: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'. Node 3: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'. Node 5: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
[18 Apr 2006 21:47]
Jonathan Miller
Time: Tuesday 18 April 2006 - 23:40:57 Status: Ndbd file system error, restart node initial Message: File not found (Ndbd file system inconsistency error, please report a bug) Error: 2815 Error data: DBLQH: File system close failed. OS errno: 4294967295 Error object: DBLQH (Line: 1821) 0x0000000a Program: /home/ndbdev/jmiller/builds/libexec/ndbd Pid: 22413 Trace: /space/run/ndb_4_trace.log.2 Version: Version 5.1.10 (beta) ***EOM*** "ndb_4_error.log" 27L, 1067C 1,1 Top
[18 Apr 2006 21:48]
Jonathan Miller
--------------- Signal ---------------- r.bn: 247 "DBLQH", r.proc: 4, r.sigId: 1434885 gsn: 270 "FSWRITECONF" prio: 1 s.bn: 253 "NDBFS", s.proc: 4, s.sigId: 1434884 length: 1 trace: 0 #sec: 0 fragInf: 0 UserPointer: 2 --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 4, r.sigId: 1434884 gsn: 164 "CONTINUEB" prio: 1 s.bn: 253 "NDBFS", s.proc: 4, s.sigId: 1434880 length: 1 trace: 0 #sec: 0 fragInf: 0 Scanning the memory channel again with no delay --------------- Signal ---------------- r.bn: 253 "NDBFS", r.proc: 4, r.sigId: 1434883 gsn: 257 "FSCLOSEREQ" prio: 0 s.bn: 247 "DBLQH", s.proc: 4, s.sigId: 1434882 length: 5 trace: 0 #sec: 0 fragInf: 0 UserPointer: 155 FilePointer: 672 UserReference: H'00f70004 Flags: H'00000000, Don't remove file H'000002a0 H'00f70004 H'0000009b H'00000000 H'00003ab0 --------------- Signal ----------------
[19 Apr 2006 6:53]
Jonas Oreland
Can you get it repeatable on your hw ? If no, then this is a duplicate of #13744 which has been open forever and we never manage to reproduce it. If yes, then it's very interesting...
[21 Apr 2006 4:46]
Jonas Oreland
Duplicate of bug#13744