Bug #16733 | Signal 11 (Segmentation Fault) causes cluster shutdown | ||
---|---|---|---|
Submitted: | 23 Jan 2006 18:05 | Modified: | 6 Mar 2006 8:56 |
Reporter: | Tim Heath | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 5.0.16 | OS: | Linux (SuSE Linux 9.3) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[23 Jan 2006 18:05]
Tim Heath
[23 Jan 2006 18:20]
Jonas Oreland
Hi, Can you upload /var/lib/mysql-cluster/ndb_3_trace.log.17 and possibly you config.ini aswell
[23 Jan 2006 19:06]
Tim Heath
Files uploaded as requested
[23 Jan 2006 19:25]
Jonas Oreland
The tracefile showed that the error actually came from node 2. Can you sent the tracefile from that, and the cluster log aswell
[23 Jan 2006 20:11]
Tim Heath
cluster log (arbitrator)
Attachment: cluster.log.node4.gz (application/gzip, text), 34.18 KiB.
[23 Jan 2006 20:12]
Tim Heath
Additional files uploaded
[25 Jan 2006 10:35]
Jonas Oreland
Looking at clusterlog I see the lost heartbeats. This indicates a serious problem... This has happened to others when e.i. there is a cron schedulted os backup or similar that will temporary swap out ndbd process. Or there is _too_ high load. The signal 11, is ofcourse incorrect, but the cluster was daying anyway... Can you investigate further?
[25 Jan 2006 15:53]
Tim Heath
Jonas, thanks for the information. When I looked at both cluster logs (from the two management nodes), it looked like both ndbd nodes were experiencing missed heartbeats and that all nodes in the cluster were involved-- i.e., the two ndbd nodes were missing heartbeats from each other but were also missing heartbeats from the management nodes and the mysqld/API nodes. So, if I'm reading this correctly, it appears that this was not related to a problem on a single server (?). There were no backups or other scheduled jobs running at the time of the failure. All of our cluster servers are running on a fiber gigE switch, and the switch did not log any failures during this period. I can't find anything in the syslog entries (/var/log/messages) on any of the boxes that would point to a server problem. Would you say that this points to overworked ndbd nodes? If so, could you recommend any ndbd tuning adjustments? Thanks again for your help
[6 Feb 2006 8:56]
Jonas Oreland
>Would you say that this points to overworked ndbd nodes? If so, could you >recommend any ndbd tuning adjustments? It's hard to say, it could be a bug aswell. I would recommend, keeping track of load on machine. Using "vmstat" to see general load on machine and "top" to see load on ndbd process. If the load is constantly high, then there might be an overload problem. If not it might be a bug, or a specific sql query that causing it. /Jonas
[7 Mar 2006 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".