Bug #40312 Node restart ends with error 2303 as copyfrag failed
Submitted: 24 Oct 2008 15:00 Modified: 16 Sep 2014 8:45
Reporter: Michael Neubert Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Linux
Assigned to: CPU Architecture:Any
Tags: copyfrag failed, error 2303, mysql-5.1.28 ndb-6.3.18-RC, node restart

[24 Oct 2008 15:00] Michael Neubert
Description:
Hello,

we are currently experiencing problems with a simple node restart. We are using a cluster setup of 4 nodes (number of replicas = 2). At the moment we are not able to start a disconnected node of one nodegroup. Even an initial restart does not succeed.

ndb_4_error.log:

Time: Friday 24 October 2008 - 04:55:27
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 4 as copyfrag failed, error: 0
Error object: NDBCNTR (Line: 249) 0x0000000e
Program: ndbd
Pid: 26609
Trace: /var/log/mysql/ndb_4_trace.log.1
Version: mysql-5.1.28 ndb-6.3.18-RC
***EOM***

See also attached trace log.

Best wishes
Michael

How to repeat:
no special scenario available

Suggested fix:
no fix available, workaround: start an empty cluster and restore from a given backup
[24 Oct 2008 15:03] Michael Neubert
Trace log

Attachment: ndb_4_trace.log.rar (application/octet-stream, text), 69.15 KiB.

[26 Oct 2008 22:25] Michael Neubert
Hello,

with mysql-5.1.28 ndb-6.2.16-RC the problem seems to be the same, but we got a new kind of error code: 744 - Character string is invalid for given character set.

Time: Sunday 26 October 2008 - 22:39:37
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 5 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 247) 0x0000000a
Program: ndbd
Pid: 6574
Trace: /var/log/mysql/ndb_2_trace.log.3
Version: mysql-5.1.28 ndb-6.2.16-RC
***EOM***

See also attached trace log.

Best wishes
Michael
[26 Oct 2008 22:36] Michael Neubert
trace log

Attachment: ndb_2_trace.log.rar (application/octet-string, text), 93.57 KiB.

[3 Nov 2008 15:31] Frazer Clement
Hi Michael, 
  Thanks for your bug report.
  The files you sent are encoded in RAR format, and the free unrar tool I downloaded to decompress them does not work.
  Could you resend the files encoded with zip, gzip or uncompressed?
Thanks,
Frazer
[30 Nov 2008 5:03] nelson mendaros
logs and trace logs

Attachment: ndb4logs.zip (application/zip, text), 168.97 KiB.

[30 Nov 2008 5:06] nelson mendaros
Hi! We have the same problem restarting our node. I already restored the data using ndb_restore, shutdown the cluster after successfully restoring the data and then restart all the nodes.

Below is the snapshots of the commands, messages and error log.

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=3    @10.0.0.12  (mysql-5.1.27 ndb-6.3.17, starting, Nodegroup: 0, Master)
id=4    @10.0.0.5  (mysql-5.1.27 ndb-6.3.17, not started)

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 10.0.0.23)
id=2   (mysql-5.1.27 ndb-6.3.17)

[mysqld(API)]   3 node(s)
id=5 (not connected, accepting connect from 10.0.0.25)
id=6 (not connected, accepting connect from 10.0.0.11)
id=7 (not connected, accepting connect from any host)

ndb_mgm> Node 3: Started (version 6.3.17)

ndb_mgm> 4 start
Database node 4 is being started.

ndb_mgm> Node 4: Start initiated (version 6.3.17)

ndb_mgm> Node 4: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

=============

[root@dtodb2 data]# tail ndb_4_error.log  -n100
Current byte-offset of file-pointer is: 568

Time: Sunday 30 November 2008 - 12:37:12
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 4 as copyfrag failed, error: 1501
Error object: NDBCNTR (Line: 249) 0x0000000a
Program: ndbd
Pid: 6520
Trace: /var/lib/mysql/data/ndb_4_trace.log.1
Version: mysql-5.1.27 ndb-6.3.17-RC
***EOM***

I've previously attached the log and trace log file.
[8 Dec 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[8 Dec 2008 12:06] Frazer Clement
Changing to Analyzing as alternative logs have been provided.
[8 Dec 2008 13:47] Frazer Clement
Nelson,
  Thanks for your trace file.
  The error number mentioned in the logs you sent (1501) is related to running out of Undo space for disk-based tables while performing the restart.
  It has already been noted that this problem is not well reported (Bug#30655).
  I suspect that you may need to increase the configured Undo space to successfully restart this cluster.
  I think that this is a separate issue to the original issue which sparked this bug.  Could you please open a separate bug if you wish to pursue the issue further?
Thanks,
Frazer
[8 Dec 2008 13:49] Frazer Clement
Michael,
  Could you resend/repost the log files in a format other than RAR (zip, gzip, tar etc.)
  My RAR reader cannot decompress the files you attached.
Thanks,
Frazer
[9 Jan 2009 18:18] Michael Neubert
trace log as zip file

Attachment: ndb_4_trace.log.zip (application/x-zip-compressed, text), 121.07 KiB.

[12 Mar 2009 14:12] Jonathan Miller
probably due to a resource shortage.

Can you please try GA version 5.1.30_6.3.20
[13 Mar 2009 14:49] Michael Neubert
Hello,

I'm sorry, but we don't use the Cluster Storage Engine anymore for the mentionned project. So no further informations or tests are possible.

Best wishes
Michael
[1 Apr 2009 15:05] paul miles
Trace and log files

Attachment: ndb.zip (application/x-zip-compressed, text), 146.45 KiB.

[1 Apr 2009 15:05] paul miles
I'm running the following versions of Mysql Cluster :

MySQL-Cluster-gpl-server-6.3.20-0.rhel5
MySQL-Cluster-gpl-management-6.3.20-0.rhel5
MySQL-Cluster-gpl-tools-6.3.20-0.rhel5
MySQL-Cluster-gpl-devel-6.3.20-0.rhel5
MySQL-Cluster-gpl-storage-6.3.20-0.rhel5
MySQL-Cluster-gpl-client-6.3.20-0.rhel5

I'm experiencing what looks like an identical issue to Michael.

Please see above trace and log files.

Thanks,

Paul
[20 May 2009 10:48] Nathan Thera
Hi

I am also experiencing the copyfrag 744 error. I have one disconnected node that is unable to start. Passing the initial flag into ndbd results in the same error. The cluster is a 4 node, 2 replica setup running mysql-5.1.32 ndb-7.0.5-beta.

Please see below for output from ndb_mgm and the node error log.
The trace files and a snippet of the out log are attached.

Nathan

From ndb_mgm:
Node 6: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, rest

From the error log (ndb_6_error.log):

Time: Wednesday 20 May 2009 - 04:44:06
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 6 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 260) 0x00000008
Program: ndbd
Pid: 10113
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.15
Version: mysql-5.1.32 ndb-7.0.5-beta
***EOM***
[20 May 2009 10:50] Nathan Thera
Snippet of ndb_6_out.log

Attachment: ndb_6_out.log (application/octet-stream, text), 21.66 KiB.

[20 May 2009 10:53] Nathan Thera
zip file containing the trace log for the node

Attachment: ndb_6_trace.log.zip (application/x-zip-compressed, text), 97.41 KiB.

[2 Aug 2009 0:18] Nathan Thera
Error still happening in mysql-5.1.34 ndb-7.0.6.

Logs below. Let me know if any additional logs are needed.

Nathan

Mgm:
2009-08-01 18:13:39 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, rest

Error file:
Time: Saturday 1 August 2009 - 18:12:29
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 5 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 260) 0x00000008
Program: ndbd
Pid: 14476
Trace: /var/lib/mysql-cluster/ndb_5_trace.log.10
Version: mysql-5.1.34 ndb-7.0.6
***EOM***
[2 Aug 2009 0:19] Nathan Thera
Trace file attached to specific error

Attachment: ndb_5_trace.log.zip (application/x-zip-compressed, text), 92.16 KiB.

[2 Aug 2009 0:20] Nathan Thera
snippet of the node out log

Attachment: ndb_5_out.log.snippet (text/plain), 85.23 KiB.

[23 Sep 2009 19:55] Matthew Bilek
I get this error all of the time too.  See bug #46985 for configuration information.
[24 Oct 2009 12:51] Nathan Thera
Still in 7.0.8a. Just happened for a new machine running that was added to the cluster existing cluster (all running 7.0.8a)

Attachment: ndb_5_logs-7.0.8a.zip (application/x-zip-compressed, text), 91.78 KiB.

[16 Sep 2014 8:45] Gustaf Thorslund
Old and pre-GA version so closing bug. If same error would show up with a newer version, please open a new bug.
[30 Aug 2018 14:44] James Mo
Time: Thursday 30 August 2018 - 21:31:32
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 21 as copyfrag failed, error: 0
Error object: NDBCNTR (Line: 295) 0x00000004
Program: ndbmtd
Pid: 167431 thr: 0
Version: mysql-5.7.21 ndb-7.5.9
Trace file name: ndb_21_trace.log.14
Trace file path: /app/mysql/ndbd//ndb_21_trace.log.14 [t1..t5]
***EOM***