MySQL Bugs: #40312: Node restart ends with error 2303 as copyfrag failed

Bug #40312	Node restart ends with error 2303 as copyfrag failed
Submitted:	24 Oct 2008 15:00	Modified:	16 Sep 2014 8:45
Reporter:	Michael Neubert	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-6.3	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	copyfrag failed, error 2303, mysql-5.1.28 ndb-6.3.18-RC, node restart

Description:
Hello,

we are currently experiencing problems with a simple node restart. We are using a cluster setup of 4 nodes (number of replicas = 2). At the moment we are not able to start a disconnected node of one nodegroup. Even an initial restart does not succeed.

ndb_4_error.log:

Time: Friday 24 October 2008 - 04:55:27
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 4 as copyfrag failed, error: 0
Error object: NDBCNTR (Line: 249) 0x0000000e
Program: ndbd
Pid: 26609
Trace: /var/log/mysql/ndb_4_trace.log.1
Version: mysql-5.1.28 ndb-6.3.18-RC
***EOM***

See also attached trace log.

Best wishes
Michael

How to repeat:
no special scenario available

Suggested fix:
no fix available, workaround: start an empty cluster and restore from a given backup

Trace log

Attachment: ndb_4_trace.log.rar (application/octet-stream, text), 69.15 KiB.

Hello,

with mysql-5.1.28 ndb-6.2.16-RC the problem seems to be the same, but we got a new kind of error code: 744 - Character string is invalid for given character set.

Time: Sunday 26 October 2008 - 22:39:37
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 5 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 247) 0x0000000a
Program: ndbd
Pid: 6574
Trace: /var/log/mysql/ndb_2_trace.log.3
Version: mysql-5.1.28 ndb-6.2.16-RC
***EOM***

See also attached trace log.

Best wishes
Michael

trace log

Attachment: ndb_2_trace.log.rar (application/octet-string, text), 93.57 KiB.

Hi Michael, 
  Thanks for your bug report.
  The files you sent are encoded in RAR format, and the free unrar tool I downloaded to decompress them does not work.
  Could you resend the files encoded with zip, gzip or uncompressed?
Thanks,
Frazer

logs and trace logs

Attachment: ndb4logs.zip (application/zip, text), 168.97 KiB.

Hi! We have the same problem restarting our node. I already restored the data using ndb_restore, shutdown the cluster after successfully restoring the data and then restart all the nodes.

Below is the snapshots of the commands, messages and error log.

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=3    @10.0.0.12  (mysql-5.1.27 ndb-6.3.17, starting, Nodegroup: 0, Master)
id=4    @10.0.0.5  (mysql-5.1.27 ndb-6.3.17, not started)

[ndb_mgmd(MGM)] 2 node(s)
id=1 (not connected, accepting connect from 10.0.0.23)
id=2   (mysql-5.1.27 ndb-6.3.17)

[mysqld(API)]   3 node(s)
id=5 (not connected, accepting connect from 10.0.0.25)
id=6 (not connected, accepting connect from 10.0.0.11)
id=7 (not connected, accepting connect from any host)

ndb_mgm> Node 3: Started (version 6.3.17)

ndb_mgm> 4 start
Database node 4 is being started.

ndb_mgm> Node 4: Start initiated (version 6.3.17)

ndb_mgm> Node 4: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

=============

[root@dtodb2 data]# tail ndb_4_error.log  -n100
Current byte-offset of file-pointer is: 568

Time: Sunday 30 November 2008 - 12:37:12
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 4 as copyfrag failed, error: 1501
Error object: NDBCNTR (Line: 249) 0x0000000a
Program: ndbd
Pid: 6520
Trace: /var/lib/mysql/data/ndb_4_trace.log.1
Version: mysql-5.1.27 ndb-6.3.17-RC
***EOM***

I've previously attached the log and trace log file.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Changing to Analyzing as alternative logs have been provided.

Nelson,
  Thanks for your trace file.
  The error number mentioned in the logs you sent (1501) is related to running out of Undo space for disk-based tables while performing the restart.
  It has already been noted that this problem is not well reported (Bug#30655).
  I suspect that you may need to increase the configured Undo space to successfully restart this cluster.
  I think that this is a separate issue to the original issue which sparked this bug.  Could you please open a separate bug if you wish to pursue the issue further?
Thanks,
Frazer

Michael,
  Could you resend/repost the log files in a format other than RAR (zip, gzip, tar etc.)
  My RAR reader cannot decompress the files you attached.
Thanks,
Frazer

trace log as zip file

Attachment: ndb_4_trace.log.zip (application/x-zip-compressed, text), 121.07 KiB.

probably due to a resource shortage.

Can you please try GA version 5.1.30_6.3.20

Hello,

I'm sorry, but we don't use the Cluster Storage Engine anymore for the mentionned project. So no further informations or tests are possible.

Best wishes
Michael

Trace and log files

Attachment: ndb.zip (application/x-zip-compressed, text), 146.45 KiB.

I'm running the following versions of Mysql Cluster :

MySQL-Cluster-gpl-server-6.3.20-0.rhel5
MySQL-Cluster-gpl-management-6.3.20-0.rhel5
MySQL-Cluster-gpl-tools-6.3.20-0.rhel5
MySQL-Cluster-gpl-devel-6.3.20-0.rhel5
MySQL-Cluster-gpl-storage-6.3.20-0.rhel5
MySQL-Cluster-gpl-client-6.3.20-0.rhel5

I'm experiencing what looks like an identical issue to Michael.

Please see above trace and log files.

Thanks,

Paul

Hi

I am also experiencing the copyfrag 744 error. I have one disconnected node that is unable to start. Passing the initial flag into ndbd results in the same error. The cluster is a 4 node, 2 replica setup running mysql-5.1.32 ndb-7.0.5-beta.

Please see below for output from ndb_mgm and the node error log.
The trace files and a snippet of the out log are attached.

Nathan

From ndb_mgm:
Node 6: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, rest

From the error log (ndb_6_error.log):

Time: Wednesday 20 May 2009 - 04:44:06
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 6 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 260) 0x00000008
Program: ndbd
Pid: 10113
Trace: /var/lib/mysql-cluster/ndb_6_trace.log.15
Version: mysql-5.1.32 ndb-7.0.5-beta
***EOM***

Snippet of ndb_6_out.log

Attachment: ndb_6_out.log (application/octet-stream, text), 21.66 KiB.

zip file containing the trace log for the node

Attachment: ndb_6_trace.log.zip (application/x-zip-compressed, text), 97.41 KiB.

Error still happening in mysql-5.1.34 ndb-7.0.6.

Logs below. Let me know if any additional logs are needed.

Nathan

Mgm:
2009-08-01 18:13:39 [MgmSrvr] ALERT    -- Node 5: Forced node shutdown completed. Occured during startphase 5. Caused by error 2303: 'System error, node killed during node restart by other node(Internal error, programming error or missing error message, please report a bug). Temporary error, rest

Error file:
Time: Saturday 1 August 2009 - 18:12:29
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 5 as copyfrag failed, error: 744
Error object: NDBCNTR (Line: 260) 0x00000008
Program: ndbd
Pid: 14476
Trace: /var/lib/mysql-cluster/ndb_5_trace.log.10
Version: mysql-5.1.34 ndb-7.0.6
***EOM***

Trace file attached to specific error

Attachment: ndb_5_trace.log.zip (application/x-zip-compressed, text), 92.16 KiB.

snippet of the node out log

Attachment: ndb_5_out.log.snippet (text/plain), 85.23 KiB.

I get this error all of the time too.  See bug #46985 for configuration information.

Still in 7.0.8a. Just happened for a new machine running that was added to the cluster existing cluster (all running 7.0.8a)

Attachment: ndb_5_logs-7.0.8a.zip (application/x-zip-compressed, text), 91.78 KiB.

Old and pre-GA version so closing bug. If same error would show up with a newer version, please open a new bug.

Time: Thursday 30 August 2018 - 21:31:32
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Killed by node 21 as copyfrag failed, error: 0
Error object: NDBCNTR (Line: 295) 0x00000004
Program: ndbmtd
Pid: 167431 thr: 0
Version: mysql-5.7.21 ndb-7.5.9
Trace file name: ndb_21_trace.log.14
Trace file path: /app/mysql/ndbd//ndb_21_trace.log.14 [t1..t5]
***EOM***