Bug #24030 NDBD receives signal 11 segmentation fault, cluster shuts down, cannot restart
Submitted: 7 Nov 2006 1:51 Modified: 8 Dec 2006 8:47
Reporter: Andy Smith Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.0.24 OS:Linux (Linux (RHEL4))
Assigned to: CPU Architecture:Any

[7 Nov 2006 1:51] Andy Smith
Description:
Twice in the last 24 hours my cluster has shut down because node 6 receives a segmentation fault and takes all other nodes with it.  I am unable to restart the ndbd processes and have to eventually do an --initial start again.

Please see attached logfile and config files.  The current cluster status shows node 6 as master; normally node 4 would be master but I have just performed a rolling restart.

After the error shown in ndb_6_error.log I tried to restart a data node from each node group but it only got part way and said every node was in nodegroup 0.

Up until now core dumps have been disabled however I have just done a "ulimit -c unlimited" in the startup script of each ndbd so hopefully we will get a core file next time.

If there is any other information you require or any suggestions you have please let me know.  This is our first use of MySQL Cluster; things worked fine in testing but now on the live platform we get this problem over and over and if not resolved soon we are going to have to go back to a replication setup. :(

How to repeat:
Unknown; at the moment this is happening every 9 hours or so but it's not clear what is causing it.
[7 Nov 2006 1:53] Andy Smith
config.ini

Attachment: config.ini (application/octet-stream, text), 602 bytes.

[7 Nov 2006 1:54] Andy Smith
my.cnf

Attachment: my.cnf (application/octet-stream, text), 215 bytes.

[7 Nov 2006 1:56] Andy Smith
Output of "show" in ndb_mgm

Attachment: ndb_mgm_show.txt (text/plain), 583 bytes.

[7 Nov 2006 1:57] Andy Smith
ndb_6_error.log

Attachment: ndb_6_error.log (text/x-log), 404 bytes.

[7 Nov 2006 1:58] Andy Smith
ndb_5_error.log

Attachment: ndb_5_error.log (text/x-log), 816 bytes.

[7 Nov 2006 2:00] Andy Smith
ndb_1_cluster.log

Attachment: ndb_1_cluster.log (text/x-log), 10.39 KiB.

[7 Nov 2006 2:03] Andy Smith
Also, am I missing something about the correct procedure to get the cluster started again after such a crash?

When all nodes are disconnected from the mgmd I've been kill -9 them and trying to start them again but this never works.
[7 Nov 2006 20:14] Andy Smith
Pekka Nousiainen mailed me to ask what binaries I am using.  My reply was:

This is a mysql-max-5.0.24a-linux-i686-icc-glibc23 downloaded from mysql.com.  Is there another version you would recommend me to use should it crash again?  We prefer precompiled MySQL binaries.
[8 Nov 2006 8:47] Valeriy Kravchuk
Please, try to repeat with the latest version, 5.0.27, MySQL binaries, and inform about the results.
[9 Dec 2006 0:01] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".