Bug #64508 | 7.2.4 Windows MySQL cluster dual management failure | ||
---|---|---|---|
Submitted: | 1 Mar 2012 8:51 | Modified: | 5 Mar 2012 14:12 |
Reporter: | R van der Wal | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | 7.2.4 | OS: | Windows (Datacenter/Hyper-v) |
Assigned to: | CPU Architecture: | Any | |
Tags: | dual, Lost connection, management, outage, shutdown |
[1 Mar 2012 8:51]
R van der Wal
[1 Mar 2012 9:36]
R van der Wal
ndbd node 4 tracelog at 10:20
Attachment: ndb_4_trace.log.zip (application/unknown, text), 49.05 KiB.
[1 Mar 2012 14:29]
R van der Wal
I have been strugling the whole day with this and saw the following happening: node 1 is the managementserver at A: ( #### is my text) 2012-03-01 15:14:40 [MgmtSrvr] INFO -- Node 4: Index usage is 0%(100 8K pages of total 18848) ###### we shut down B servers at this moment ####### 2012-03-01 15:14:51 [MgmtSrvr] WARNING -- Node 3: Node 11 missed heartbeat 2 2012-03-01 15:14:51 [MgmtSrvr] WARNING -- Node 3: Node 12 missed heartbeat 2 2012-03-01 15:14:53 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 2 2012-03-01 15:14:53 [MgmtSrvr] WARNING -- Node 3: Node 9 missed heartbeat 2 2012-03-01 15:14:53 [MgmtSrvr] WARNING -- Node 3: Node 10 missed heartbeat 2 2012-03-01 15:14:53 [MgmtSrvr] WARNING -- Node 3: Node 11 missed heartbeat 3 2012-03-01 15:14:53 [MgmtSrvr] WARNING -- Node 3: Node 12 missed heartbeat 3 2012-03-01 15:14:54 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 3 2012-03-01 15:14:54 [MgmtSrvr] WARNING -- Node 3: Node 9 missed heartbeat 3 2012-03-01 15:14:54 [MgmtSrvr] WARNING -- Node 3: Node 10 missed heartbeat 3 2012-03-01 15:14:54 [MgmtSrvr] WARNING -- Node 3: Node 11 missed heartbeat 4 2012-03-01 15:14:54 [MgmtSrvr] ALERT -- Node 3: Node 11 declared dead due to missed heartbeat 2012-03-01 15:14:54 [MgmtSrvr] WARNING -- Node 3: Node 12 missed heartbeat 4 2012-03-01 15:14:54 [MgmtSrvr] ALERT -- Node 3: Node 12 declared dead due to missed heartbeat 2012-03-01 15:14:54 [MgmtSrvr] INFO -- Node 3: Communication to Node 11 closed 2012-03-01 15:14:54 [MgmtSrvr] INFO -- Node 3: Communication to Node 12 closed 2012-03-01 15:14:54 [MgmtSrvr] ALERT -- Node 3: Node 11 Disconnected 2012-03-01 15:14:54 [MgmtSrvr] ALERT -- Node 3: Node 12 Disconnected 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 1: Node 4 Disconnected 2012-03-01 15:14:56 [MgmtSrvr] WARNING -- Node 3: Node 2 missed heartbeat 4 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 2 declared dead due to missed heartbeat 2012-03-01 15:14:56 [MgmtSrvr] WARNING -- Node 3: Node 9 missed heartbeat 4 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 9 declared dead due to missed heartbeat 2012-03-01 15:14:56 [MgmtSrvr] WARNING -- Node 3: Node 10 missed heartbeat 4 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 10 declared dead due to missed heartbeat 2012-03-01 15:14:56 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 closed ###### datanode id= 3 running on A lost his primairy arbitrator on B (node 2) ###### 2012-03-01 15:14:56 [MgmtSrvr] INFO -- Node 3: Lost arbitrator node 2 - process failure [state=6] 2012-03-01 15:14:56 [MgmtSrvr] INFO -- Node 3: Communication to Node 9 closed 2012-03-01 15:14:56 [MgmtSrvr] INFO -- Node 3: Communication to Node 10 closed 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 2 Disconnected 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 9 Disconnected 2012-03-01 15:14:56 [MgmtSrvr] ALERT -- Node 3: Node 10 Disconnected 2012-03-01 15:14:56 [MgmtSrvr] INFO -- Node 3: Communication to Node 2 opened 2012-03-01 15:14:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 11 opened 2012-03-01 15:14:58 [MgmtSrvr] INFO -- Node 3: Communication to Node 12 opened 2012-03-01 15:14:59 [MgmtSrvr] INFO -- Node 3: Communication to Node 9 opened 2012-03-01 15:14:59 [MgmtSrvr] INFO -- Node 3: Communication to Node 10 opened 2012-03-01 15:15:04 [MgmtSrvr] WARNING -- Node 3: Node 4 missed heartbeat 2 2012-03-01 15:15:09 [MgmtSrvr] INFO -- Node 3: Data usage is 0%(108 32K pages of total 48000) 2012-03-01 15:15:09 [MgmtSrvr] INFO -- Node 3: Index usage is 0%(100 8K pages of total 18848) 2012-03-01 15:15:09 [MgmtSrvr] WARNING -- Node 3: Node 4 missed heartbeat 3 2012-03-01 15:15:11 [MgmtSrvr] ALERT -- Node 1: Node 2 Disconnected ###### and now strange things happen, we see IP-adresses ? this is node 3! ####### 2012-03-01 15:15:18 [MgmtSrvr] DEBUG -- 172.16.28.44:62057: Connected! 2012-03-01 15:15:18 [MgmtSrvr] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'. 2012-03-01 15:15:18 [MgmtSrvr] DEBUG -- 172.16.28.44:62057: Eof! 2012-03-01 15:15:18 [MgmtSrvr] DEBUG -- 172.16.28.44:62057: Stopped! 2012-03-01 15:15:18 [MgmtSrvr] DEBUG -- 172.16.28.44:62057: Disconnected! 2012-03-01 15:15:24 [MgmtSrvr] ALERT -- Node 1: Node 3 Disconnected ###### and the cluster is gone!! ####### 2012-03-01 15:16:00 [MgmtSrvr] DEBUG -- 172.16.28.44:53149: Eof! 2012-03-01 15:16:00 [MgmtSrvr] DEBUG -- 172.16.28.44:53149: Stopped! 2012-03-01 15:16:00 [MgmtSrvr] DEBUG -- 172.16.28.44:53149: Disconnected! ###### we enabled a auto-restarter at datanode 3, so it is started automaticly ####### 2012-03-01 15:16:03 [MgmtSrvr] DEBUG -- 172.16.28.44:62061: Connected! ###### nodeid 0? who is that? why uses it node 0? ####### 2012-03-01 15:16:04 [MgmtSrvr] DEBUG -- Trying to allocate nodeid for 172.16.28.44(nodeid: 0, type: NDB) 2012-03-01 15:16:04 [MgmtSrvr] DEBUG -- [0]: 3, 'PARK-DATA-NODE3', 1 ###### DUH??!! WHAT? ####### 2012-03-01 15:16:04 [MgmtSrvr] DEBUG -- Nodeid 3 for data node reserved locally since cluster was not available 2012-03-01 15:16:04 [MgmtSrvr] INFO -- Nodeid 3 allocated for NDB at 172.16.28.44 2012-03-01 15:16:04 [MgmtSrvr] DEBUG -- 172.16.28.44:62063: Connected! 2012-03-01 15:16:08 [MgmtSrvr] DEBUG -- 172.16.28.44:62063: Stopped! 2012-03-01 15:16:08 [MgmtSrvr] DEBUG -- 172.16.28.44:62063: Disconnected! 2012-03-01 15:16:08 [MgmtSrvr] DEBUG -- Releasing local reservation for nodeid 3 2012-03-01 15:16:08 [MgmtSrvr] INFO -- Node 1: Node 3 Connected 2012-03-01 15:16:09 [MgmtSrvr] INFO -- Node 3: Start phase 0 completed 2012-03-01 15:16:09 [MgmtSrvr] WARNING -- Node 3: Failed to memlock pages, error: 34 (Result too large) 2012-03-01 15:16:09 [MgmtSrvr] INFO -- Node 3: Communication to Node 4 opened ###### cluster runs again on A side, but only thanks to the restarter. so much for clustering... #######
[5 Mar 2012 14:08]
R van der Wal
isn't anyone following up on this? do i need to supply extra info?
[5 Mar 2012 14:12]
R van der Wal
Just read the comment on http://forums.mysql.com/read.php?25,517640,518103#msg-518103 It is not a bug but a feature. Sorry for the previous bug-upgrade to S1. This bugreport can be closed.