Bug #23739 Node commits suicide (Error 2303)
Submitted: 27 Oct 2006 21:36 Modified: 1 Dec 2006 12:01
Reporter: Steve Wolf Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.0.24a OS:Linux (CentOS 4.4 x86_64)
Assigned to: CPU Architecture:Any
Tags: GCP stop, killed, NDBCNTR

[27 Oct 2006 21:36] Steve Wolf
Description:
Four-node cluster died.  ndb_1_error.log contains:

Current byte-offset of file-pointer is: 568

Time: Friday 27 October 2006 - 19:19:20
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Node 1 killed this node because GCP stop was detected
Error object: NDBCNTR (Line: 197) 0x0000000a
Program: ndbd
Pid: 5300
Trace: /usr/local/mysql/data/ndb_1_trace.log.1
Version: Version 5.0.24
***EOM***

The interesting thing is this node _is_ Node 1.  So it killed itself.  All the other nodes identically reported:

Current byte-offset of file-pointer is: 568

Time: Friday 27 October 2006 - 21:59:44
Status: Temporary error, restart node
Message: Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s) (Arbitration error)
Error: 2305
Error data: Arbitrator decided to shutdown this node
Error object: QMGR (Line: 4556) 0x0000000a
Program: ndbd
Pid: 5148
Trace: /usr/local/mysql/data/ndb_2_trace.log.1
Version: Version 5.0.24
***EOM***

I will attach the trace logs.

How to repeat:
Unknown

Suggested fix:
Unknown
[27 Oct 2006 21:44] Jonas Oreland
This looks like

http://bugs.mysql.com/bug.php?id=20904
[27 Oct 2006 21:50] Steve Wolf
Compressed tar of log and trace files

Attachment: bug23739_logs.tar.gz (application/x-gzip, text), 180.82 KiB.

[27 Oct 2006 21:54] Steve Wolf
I did a search for "GCP stop" and didn't find the bug you reference -- thanks for providing it.  This could be the same, but on the other hand this could be related to the cluster getting close to full (in the management node log, it reports 90% full and then nodes start missing heartbeats).  Please have a look at the files I attached.  Thanks.
[27 Oct 2006 22:25] Steve Wolf
Now the cluster won't start up.  I got this on Node 2, which was acting as Master during the startup:

Time: Friday 27 October 2006 - 23:22:02
Status: Unknown
Message: No message slogan found (please report a bug if you get this error code) (Unknown)
Error: 0
Error data: We(2) have been declared dead by 1 reason: Hearbeat failure(4)
Error object: QMGR (Line: 2840) 0x0000000a
Program: ndbd
Pid: 5398
Trace: /usr/local/mysql/data/ndb_2_trace.log.3
Version: Version 5.0.24
***EOM***

Is this related, or should I open another bug report?
[1 Nov 2006 12:01] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.0.27. In case of the same problem, please, open a new reports and mention this one there (if data uploaded are still relevant). This one will be closed as likely duplicate of bug #20904 then.
[2 Dec 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".