Bug #25149 | Unable to restart data node | ||
---|---|---|---|
Submitted: | 18 Dec 2006 16:09 | Modified: | 27 Jan 2008 8:10 |
Reporter: | Lars Bo Svenningsen | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1.14-beta-linux-i686-glibc23 | OS: | Linux (Debian) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | Ndbd file system error, REDO Log |
[18 Dec 2006 16:09]
Lars Bo Svenningsen
[18 Dec 2006 16:12]
Lars Bo Svenningsen
ndb_3_out.log
Attachment: ndb_3_out.log (text/x-log), 4.04 KiB.
[18 Dec 2006 16:13]
Lars Bo Svenningsen
ndb1_cluster.log
Attachment: ndb1_cluster.log (text/x-log), 10.63 KiB.
[18 Dec 2006 16:15]
Lars Bo Svenningsen
Setup: Debian Etch - Linux 2.6.17
[18 Dec 2006 23:10]
Jonas Oreland
Hi, Can you also upload error/tracefile for failing node ? /Jonas
[19 Dec 2006 7:54]
Lars Bo Svenningsen
ndb_3_trace.log.9
Attachment: bug-data-25149.tgz (application/x-gzip, text), 42.02 KiB.
[19 Dec 2006 8:43]
Lars Bo Svenningsen
Found a SQL Node still attached to the cluster, stopped it and --initial again. I now managed to get the data note up'n'running, but still I see some disturbing messages (Permanent error: Application error) in ndb_1_cluster.log: 2006-12-19 09:36:01 [MgmSrvr] INFO -- Node 3: Node shutdown initiated 2006-12-19 09:36:11 [MgmSrvr] ALERT -- Node 2: Node 3 Disconnected 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 2: We are Node 2 with dynamic ID 1, our left neighbour is Node 65535, our right is Node 65535 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 2: Communication to Node 3 closed 2006-12-19 09:36:11 [MgmSrvr] ALERT -- Node 2: Network partitioning - arbitration required 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=7] 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 1: Node 3 Connected 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 3: Node shutdown completed, restarting, no start. 2006-12-19 09:36:11 [MgmSrvr] ALERT -- Node 2: Arbitration won - positive reply from node 1 2006-12-19 09:36:11 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue: 2006-12-19 09:36:12 [MgmSrvr] INFO -- Node 2: Started arbitrator node 1 [ticket=04cc002799daea3f] 2006-12-19 09:36:12 [MgmSrvr] WARNING -- Cluster refused allocation of id 3. Connection from ip 192.168.99.18. Returned error string "Cluste r refused allocation of id 3. Error: 1703 (Node failure handling not completed: Permanent error: Application error)." 2006-12-19 09:36:15 [MgmSrvr] INFO -- Node 2: Communication to Node 3 opened 2006-12-19 09:36:15 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 reserved for ip 192.168.99.18, m_reserved_nodes 000000000000000a. 2006-12-19 09:36:15 [MgmSrvr] INFO -- Node 1: Node 3 Connected 2006-12-19 09:36:16 [MgmSrvr] INFO -- Mgmt server state: nodeid 3 freed, m_reserved_nodes 0000000000000002. 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 2: Node 3 Connected 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 2: We are Node 2 with dynamic ID 1, our left neighbour is Node 3, our right is Node 3 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 2: Node 3: API version 5.1.14 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Node 2 Connected 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: CM_REGCONF president = 2, own Node = 3, our dynamic id = 40 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: We are Node 3 with dynamic ID 40, our left neighbour is Node 2, our right is Node 2 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Node 2: API version 5.1.14 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Start phase 1 completed 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Start phase 2 completed (node restart) 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 2: DICT: locked by node 3 for NodeRestart 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 4 ops: 0 poll: 0 cnt: 0 queue: 3L 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Start phase 3 completed (node restart) 2006-12-19 09:36:16 [MgmSrvr] INFO -- Node 3: Start phase 4 completed (node restart) 2006-12-19 09:36:17 [MgmSrvr] INFO -- Node 3: Receive arbitrator node 1 [ticket=04cc002799daea3f] 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 3 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 10 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 11 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 13 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 14 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 16 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 17 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 18 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 19 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 21 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 23 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 25 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 27 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 29 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 31 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 33 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 35 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 37 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 39 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 3: DICT: index 41 activated 2006-12-19 09:36:23 [MgmSrvr] INFO -- Node 2: Node: 3 StartLog: [GCI Keep: 270304 LastCompleted: 270475 NewestRestorable: 270482] 2006-12-19 09:36:30 [MgmSrvr] INFO -- Node 2: Local checkpoint 213 started. Keep GCI = 270404 oldest restorable GCI = 270404 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Suma: asking node 2 to recreate subscriptions on me 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Suma: node 2 has completed restoring me 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 5 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 6 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 7 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 4 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 5 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 6 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 7 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 8 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 9 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 10 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 11 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Communication to Node 12 opened 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 8 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 9 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Start phase 100 completed (node restart) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Suma: initiate handover with nodes 0000000000000004 GCI: 270494 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Suma: handover from node 2 gci: 270494 buckets: 00000002 (2) 2006-12-19 09:36:42 [MgmSrvr] INFO -- Node 3: Node 1: API version 5.1.14 2006-12-19 09:36:48 [MgmSrvr] INFO -- Node 2: DICT: unlocked by node 3 for NodeRestart 2006-12-19 09:36:48 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue: 2006-12-19 09:36:48 [MgmSrvr] INFO -- Node 3: Start phase 101 completed (node restart) 2006-12-19 09:36:48 [MgmSrvr] INFO -- Node 3: Started (version 5.1.14)
[20 Dec 2006 10:38]
Jonas Oreland
hi, can you also upload ndb_3_error.log do you still get this? would it be possible to also upload actual filesystem (ndb_3_fs/*) when getting this? /Jonas