Bug #30935 NDB data node crashed with error 2339 with message Send signal error.
Submitted: 10 Sep 2007 15:03 Modified: 12 Oct 2009 8:45
Reporter: Anatoly Pidruchny (Candidate Quality Contributor) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1 OS:Linux (RH x86_64)
Assigned to: Assigned Account CPU Architecture:Any
Tags: 5.1.20

[10 Sep 2007 15:03] Anatoly Pidruchny
Description:
NDB data nodes crashed after staying up for more then a month. First, for some reason node 2 shut down itself with the message "Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)." I do not think there was something wrong with our network, but can not rule out this possibility 100%. After about an hour, node 3 crashed with the following information printed to the error log:

Current byte-offset of file-pointer is: 568                       

Time: Saturday 8 September 2007 - 02:19:20
Status: Temporary error, restart node
Message: Send signal error (Internal error, programming error or missing error message, please report a bug)
Error: 2339
Error data: Signal (GSN: 31, Length: 5, Rec Block No: 0)
Error object: SimulatedBlock.cpp:214
Program: ndbd
Pid: 14095
Trace: /sm/mysql/ndb_data/ndb_3_trace.log.1
Version: Version 5.1.20 (beta)
***EOM***

There were some activity going on when the nodes crashed. It is not easy to tell what exactly queries were running, because there are lots of them. Please see the files attached for more information.

How to repeat:
The problem is not reproduceable. If it is not possible to identify the problem without the need to reproduce it then please close this bug report.
[10 Sep 2007 15:04] Anatoly Pidruchny
Cluster configuration file

Attachment: config.ini (application/octet-stream, text), 1.03 KiB.

[10 Sep 2007 15:05] Anatoly Pidruchny
mysqlbug file

Attachment: mysqlbug (application/octet-stream, text), 10.66 KiB.

[10 Sep 2007 15:05] Anatoly Pidruchny
Cluster log file

Attachment: ndb_1_cluster.log (application/octet-stream, text), 203.75 KiB.

[10 Sep 2007 15:05] Anatoly Pidruchny
Node 2 error log

Attachment: ndb_2_error.log (application/octet-stream, text), 568 bytes.

[10 Sep 2007 15:06] Anatoly Pidruchny
Node 2 output log

Attachment: ndb_2_out.log (application/octet-stream, text), 7.67 KiB.

[10 Sep 2007 15:07] Anatoly Pidruchny
Node 3 error log

Attachment: ndb_3_error.log (application/octet-stream, text), 568 bytes.

[10 Sep 2007 15:07] Anatoly Pidruchny
Node 3 output log

Attachment: ndb_3_out.log (application/octet-stream, text), 42.45 KiB.

[10 Sep 2007 15:07] Hartmut Holzgraefe
We need at least the /sm/mysql/ndb_data/ndb_3_trace.log.1 file
to further analyse this, maybe even all error and trace logs
from all data nodes and the cluster log from the management node.

We recommend using the ndb_error_reporter tool to collect
all logs:

http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-utilities-ndb-error-reporter.html
[10 Sep 2007 15:09] Anatoly Pidruchny
Gzipped trace log from node 2

Attachment: ndb_2_trace.log.1.gz (application/x-gzip, text), 63.31 KiB.

[10 Sep 2007 15:10] Anatoly Pidruchny
Gzipped trace log from node 3

Attachment: ndb_3_trace.log.1.gz (application/x-gzip, text), 30.26 KiB.

[10 Sep 2007 15:13] Anatoly Pidruchny
Hi, Hartmut,

please let me know if you need any more information.

Regards,
Anatoly.
[10 Sep 2007 18:28] Jonas Oreland
- Crash of node 2 looks like some kind of overload on machine...
2007-09-08 00:56:46 [ndbd] WARNING  -- Watchdog: Warning overslept 9022 ms, expected 100 ms.

This implies that it was swapped out for 9!! seconds...more than enough to be voted out of cluster...

- Crash of node 3 is a bug.
The API_FAILREQ is received _before_ last signal from then node...

/Jonas
[12 Oct 2009 8:45] Jonas Oreland
The API_FAIL_REQ problem has been solved in http://bugs.mysql.com/bug.php?id=47039

Closing this as duplicate