Bug #63300 | MySQL Cluster keeps crashing | ||
---|---|---|---|
Submitted: | 17 Nov 2011 5:53 | Modified: | 20 Dec 2011 18:38 |
Reporter: | Srikrishnan Chitoor | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1.56 ndb-7.1.15 | OS: | Linux (Cent OS 5.6 - 32 Bit) |
Assigned to: | CPU Architecture: | Any | |
Tags: | 2305, crash, error |
[17 Nov 2011 5:53]
Srikrishnan Chitoor
[17 Nov 2011 6:04]
Jonas Oreland
question 1: is this 7.1.15 or 7.1.15a 7.1.15 contained a very serious bug.... if you're using 7.1.15 (wo/ a) please retry with 7.1.15a else question 2: the error report contains nothing but your config.ini (maybe it failed to retreive other files) we're also interested in ndb_*_cluster.log* ndb_*_error.log ndb_*_trace.log.* /Jonas
[17 Nov 2011 6:17]
Srikrishnan Chitoor
Thanks for the prompt reply. I installed 7.1.15a. Pls. see the output of "rpm -qa|grep -i mysql" command below: ** START MySQL-Cluster-gpl-storage-7.1.15a-1.rhel5 MySQL-Cluster-gpl-server-7.1.15a-1.rhel5 MySQL-Cluster-gpl-client-7.1.15a-1.rhel5 MySQL-Cluster-gpl-shared-7.1.15a-1.rhel5 MySQL-Cluster-gpl-devel-7.1.15a-1.rhel5 ** END However, when I do ndb_mgm from Management node and do a "show", it shows mysql-5.1.56 ndb-7.1.15, Nodegroup: 0, Master I have also attached the full trace and cluster files in here.
[17 Nov 2011 7:18]
Jonas Oreland
ndb_*_cluster.log* is still missing...
[17 Nov 2011 7:54]
Srikrishnan Chitoor
Added Cluster log from NDB Management server. The Data/MySQL nodes do not have any logs like *cluster*.log
[17 Nov 2011 8:31]
Jonas Oreland
Looking at cluster log...you can see sporadic missed heartbeats that sometimes leads to nodes being voted out of cluster, sometimes the arbitrator is voted out of cluster, making node failures become cluster failures. It seems that your platform is not real-time enough or that you run other tasks on them, which sometimes gives unpredictable response-times to data-nodes. I suggest you try with HeartbeatIntervalDbDb=5000 HeartbeatIntervalDbApi=5000 This means that failure detection will be somewhat slower (if machine is rebooted, wo/ killing processes first..i.e hard reboot) but that cluster should be much more resilient to temporary latency spikes /Jonas Setting status to: Waiting on feedback
[19 Nov 2011 2:14]
Srikrishnan Chitoor
Have changed the configuration and restarted services. So far (36 hours after change), there is no issue. Will observe for a week and give feedback.
[21 Dec 2011 7:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".