Bug #38648 | Node wont restart even after ndbd --initial error 2341 | ||
---|---|---|---|
Submitted: | 8 Aug 2008 1:50 | Modified: | 14 Oct 2008 7:11 |
Reporter: | Farhad Shakeri | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 5.0.37 & 5.0.51 | OS: | Linux (Fedora 6) |
Assigned to: | CPU Architecture: | Any |
[8 Aug 2008 1:50]
Farhad Shakeri
[8 Aug 2008 6:45]
Susanne Ebrecht
Many thanks for writing a bug report. MySQL 5.0.37 is really old. Please try a newer version (actual version is MySQL 5.0.51b) and let us know if you still have this problem with newer version.
[8 Aug 2008 23:27]
Farhad Shakeri
Sure will do asap. We are just wondering will it be OK to upgrade one node to 5.0.51b while the rest are 5.0.37 ? Since this cluster has 2 Nodegroups should we upgrade one Node in each Nodegroup? Documents only talk about a single Nodegroup. Thanks
[9 Aug 2008 5:31]
Hartmut Holzgraefe
> We are just wondering will it be OK to upgrade one node to 5.0.51b while the rest are 5.0.37 ? Yes, these versions are upgrade compatible, see http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-upgrade-downgrade-compatibility.html >Since this cluster has 2 Nodegroups should we upgrade one Node in each Nodegroup? Documents only talk about a single Nodegroup. As soon as you got the failing node working again you should do a rolling restart as documented in http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-rolling-restart.html as even though the versions are upgrade compatible you should not run different versions on the nodes for too long.
[12 Aug 2008 0:20]
Farhad Shakeri
Hi again, Just upgrade the ndbd_manager to 5.0.51 plus upgraded node 4 to 5.0.51 but we are still getting the exact same error: 2008-08-11 17:01:03 [MgmSrvr] INFO -- Node 4: Start phase 1 completed 2008-08-11 17:01:03 [MgmSrvr] INFO -- Node 4: Start phase 2 completed (initial node restart) 2008-08-11 17:01:03 [MgmSrvr] INFO -- Node 4: Receive arbitrator node 1 [ticket=0d46000eb433ef5f] 2008-08-11 17:01:04 [MgmSrvr] INFO -- Node 2: DICT: locked by node 4 for NodeRestart 2008-08-11 17:01:04 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 4 ops: 0 poll: 0 cnt: 0 queue: 4L 2008-08-11 17:01:37 [MgmSrvr] INFO -- Node 4: Start phase 3 completed (initial node restart) 2008-08-11 17:02:56 [MgmSrvr] INFO -- Node 4: Start phase 4 completed (initial node restart) 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 1: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 2: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 2: Communication to Node 4 closed 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 3: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 5: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 6: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 7: Node 4 Disconnected 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 3: Communication to Node 4 closed 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 5: Communication to Node 4 closed 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 7: Communication to Node 4 closed 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 6: Communication to Node 4 closed 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 2: Arbitration check won - node group majority 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 2: President restarts arbitration thread [state=6] 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 2: DICT: remove lock by failed node 4 for NodeRestart 2008-08-11 17:08:40 [MgmSrvr] INFO -- Node 2: DICT: lock bs: 0 ops: 0 poll: 0 cnt: 0 queue: 2008-08-11 17:08:40 [MgmSrvr] ALERT -- Node 4: Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. From Node 4: 2008-08-11 17:00:48 [ndbd] INFO -- Angel pid: 3052 ndb pid: 3053 2008-08-11 17:00:48 [ndbd] INFO -- NDB Cluster -- DB node 4 2008-08-11 17:00:48 [ndbd] INFO -- Version 5.0.51 -- 2008-08-11 17:00:48 [ndbd] INFO -- Configuration fetched at 192.168.1.1 port 1186 2008-08-11 17:00:48 [ndbd] INFO -- Start initiated (version 5.0.51) 2008-08-11 17:08:39 [ndbd] INFO -- Error handler startup shutting down system 2008-08-11 17:08:40 [ndbd] INFO -- Error handler shutdown completed - exiting 2008-08-11 17:08:40 [ndbd] INFO -- Angel received ndbd startup failure count 1. 2008-08-11 17:08:40 [ndbd] ALERT -- Node 4: Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. If you need the trace_log please let me know where to send it to. Thanks
[12 Aug 2008 1:21]
Farhad Shakeri
trace log for node 4
Attachment: ndb_4_trace.log.11.gz (application/x-gzip, text), 52.56 KiB.
[23 Sep 2008 0:00]
Farhad Shakeri
This problem has been solved. The problem was pin pointed to lack of memory. We increased the hardware memory by 25% and DataMemory by 35% and system seems stable. Upgrading to 5.0.51 alone was not enough.
[14 Oct 2008 7:11]
Bernd Ocklin
I am closing since this issue obviously was solved.