Bug #79646 Mysql Cluster Data Node Shutdowns Immediately After Being Started
Submitted: 15 Dec 2015 11:10 Modified: 30 Dec 2015 13:53
Reporter: Serhat Demircan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:7.4.6 OS:Debian (Wheezy)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: MySQL Cluster, ndbcluster

[15 Dec 2015 11:10] Serhat Demircan
Description:
I have an mysql cluster which have 4 api nodes, 2 management nodes and 8 data nodes. Today stopped a data node for maintenance. After starting it again data node shutdowns immediately with following erros. I could not find a way to bring this data node online. 

==> /ndbdata/ndb_5_error.log <==
Time: Tuesday 15 December 2015 – 01:58:45
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 19291) 0×00000002
Program: ndbmtd
Pid: 7546 thr: 12
Version: mysql-5.6.24 ndb-7.4.6
Trace: /ndbdata/ndb_5_trace.log.5 [t1..t15]
***EOM***

==> /ndbdata/ndb_5_out.log <==
2015-12-15 01:58:08 [ndbd] INFO     -- Node started
2015-12-15 01:58:45 [ndbd] INFO     -- /export/home/pb2/build/sb_0-14878975-1427910955.8/mysql-cluster-gpl-7.4.6/storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
2015-12-15 01:58:45 [ndbd] INFO     -- DBTC (Line: 19291) 0x00000002
2015-12-15 01:58:45 [ndbd] INFO     -- Error handler shutting down system
2015-12-15 01:58:45 [ndbd] INFO     -- Error handler shutdown completed - exiting
2015-12-15 01:58:48 [ndbd] DEBUG    -- Angel got child 7546
2015-12-15 01:58:48 [ndbd] DEBUG    -- error: 2341, signal: 0, sphase: 255
2015-12-15 01:58:48 [ndbd] ALERT    -- Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
On a cluster which have 8 data nodes with my configuration:
1 - Stop a data node
2 - Then start it again
[15 Dec 2015 11:20] Serhat Demircan
Trace Log

Attachment: ndb_5_trace.log.5 (application/octet-stream, text), 1003.90 KiB.

[15 Dec 2015 11:23] Serhat Demircan
Cluster config

Attachment: config.ini (application/octet-stream, text), 3.77 KiB.

[28 Dec 2015 9:37] Serhat Demircan
ping
[30 Dec 2015 13:53] MySQL Verification Team
Hi,
Thanks for reporting this bug.
I cannot reproduce this on a new 8node cluster.
I assume you can't reproduce the problem neither, can you confirm that? I don't consider "reproducing" if you just can't start the data node that is now in crashed state.

If I understand you correctly you have 8 node cluster and one node is now unable to start (other 7 nodes are running properly). In order to get out of this problem, start the problematic node with --initial, it will clear up the filesystem on the failing node, reload all data from replica inside the cluster and start the node properly. Once you have a properly running cluster if then by stopping a node you come again to same situation (node won't start again) then that is "reproduction" of this issue.

all best
Bogdan Kecman