Bug #24739 | Node crash after NIC watchdog timeout | ||
---|---|---|---|
Submitted: | 30 Nov 2006 23:12 | Modified: | 1 Dec 2006 11:21 |
Reporter: | Adam Dixon (Candidate Quality Contributor) | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | 5.0.21 | OS: | Linux (CentOS4) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | cluster, timeout, watchdog |
[30 Nov 2006 23:12]
Adam Dixon
[30 Nov 2006 23:13]
Adam Dixon
--ndb-error.log-- Time: Friday 1 December 2006 - 07:37:22 Status: Unknown Message: No message slogan found (please report a bug if you get this error code) (Unknown) Error: 0 Error data: We(14) have been declared dead by 18 reason: Hearbeat failure(4) Error object: QMGR (Line: 2840) 0x0000000e Program: ndbd Pid: 3647 Trace: /data/mysql-cluster/ndbd/ndb_14_trace.log.4 Version: Version 5.0.21 ***EOM*** --ndbd-out.log-- 2006-12-01 07:37:22 [ndbd] INFO -- Error handler shutting down system 2006-12-01 07:37:23 [ndbd] INFO -- Error handler shutdown completed - exiting 2006-12-01 07:37:23 [ndbd] ALERT -- Node 14: Forced node shutdown completed. Initiated by signal 0. 2006-12-01 07:43:41 [ndbd] WARNING -- Unable to report shutdown reason to 10.10.250.1:1186: Could not connect to socket : Unable to connect with connect string: nodeid=0,10.10.250.1:1186 --ndb_mgm.log-- 2006-12-01 07:36:50 [MgmSrvr] INFO -- Node 13: Local checkpoint 66999 started. Keep GCI = 10562746 oldest restorable GCI = 10562862 2006-12-01 07:37:19 [MgmSrvr] WARNING -- Node 16: Node 14 missed heartbeat 2 2006-12-01 07:37:20 [MgmSrvr] WARNING -- Node 16: Node 14 missed heartbeat 3 2006-12-01 07:37:21 [MgmSrvr] INFO -- Node 1: Node 14 Connected 2006-12-01 07:37:22 [MgmSrvr] WARNING -- Node 16: Node 14 missed heartbeat 4 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 16: Node 14 declared dead due to missed heartbeat 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 13: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 16: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 18: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 17: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 11: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 12: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 18: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 18: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 15: Communication to Node 14 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 13: Arbitration check won - node group majority 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 13: President restarts arbitration thread [state=6] 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 13: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 12: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 11: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 16: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 15: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 17: Node 14 Disconnected 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 11: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 11: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 12: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 12: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 13: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 13: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 15: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 15: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 16: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 16: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 17: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 17: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] ALERT -- Node 18: Node 2 Disconnected 2006-12-01 07:37:22 [MgmSrvr] INFO -- Node 18: Communication to Node 2 closed 2006-12-01 07:37:22 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 freed, m_reserved_nodes 000000000000000a. 2006-12-01 07:37:23 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 reserved for ip 10.10.250.1, m_reserved_nodes 000000000000000e. 2006-12-01 07:37:23 [MgmSrvr] INFO -- Node 2: mysqld --server-id=-889029611 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 17: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 18: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 13: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 11: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 12: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 15: Communication to Node 2 opened 2006-12-01 07:37:26 [MgmSrvr] INFO -- Node 16: Communication to Node 2 opened 2006-12-01 07:38:20 [MgmSrvr] WARNING -- Node 12: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:20 [MgmSrvr] WARNING -- Node 15: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:21 [MgmSrvr] WARNING -- Node 16: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:21 [MgmSrvr] WARNING -- Node 17: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:21 [MgmSrvr] WARNING -- Node 18: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:21 [MgmSrvr] WARNING -- Node 13: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:21 [MgmSrvr] WARNING -- Node 11: Failure handling of node 14 has not completed in 1 min. - state = 3 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 16: Communication to Node 14 opened 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 17: Communication to Node 14 opened 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 18: Communication to Node 14 opened 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 13: Communication to Node 14 opened 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 11: Communication to Node 14 opened 2006-12-01 07:38:28 [MgmSrvr] INFO -- Node 12: Communication to Node 14 opened 2006-12-01 07:38:29 [MgmSrvr] INFO -- Node 15: Communication to Node 14 opened 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 17: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 18: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 15: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 15: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 16: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 16: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 17: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 18: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 12: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 11: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 11: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 13: Node 2 Connected 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 13: Node 2: API version 5.0.21 2006-12-01 07:38:59 [MgmSrvr] INFO -- Node 12: Node 2: API version 5.0.21 2006-12-01 07:48:00 [MgmSrvr] INFO -- Node 13: Local checkpoint 67000 started. Keep GCI = 10562886 oldest restorable GCI = 10562896 2006-12-01 07:52:37 [MgmSrvr] INFO -- Node 13: Local checkpoint 67001 started. Keep GCI = 10563271 oldest restorable GCI = 10563387
[30 Nov 2006 23:14]
Adam Dixon
Category - Cluster
[1 Dec 2006 11:21]
Jonas Oreland
hi i read the trace files and everything look ok. the error+trace will be generated in all cases expect graceful shutdown, so this is ok /jonas