MySQL Bugs: #65256: NDBD error 2341 in NDBCNTR during startup phase 5 causes cluster shutdown

Bug #65256	NDBD error 2341 in NDBCNTR during startup phase 5 causes cluster shutdown
Submitted:	9 May 2012 13:59	Modified:	5 Oct 2016 22:55
Reporter:	Przemysław Ołtarzewski	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	7.2.5	OS:	Solaris (10, SPARC 64bit)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	NDBD NDBCNTR 2341 CREATE_TABLE_REF phase 5

Description:
1. Cluster configuration:
- 4 separate hosts
- 2 replicas
- 2 management nodes with manually assigned IDs
- 4 data nodes with manually assigned IDs and node groups
- several mysqld / api nodes

The management nodes 1 and 2 are deployed on the same machines as data nodes 11 and 21, respectively.

Custom layout has been used for cluster directories and files due to the presence of older MySQL installations on development machines.

Hosts used in cluster are actually Sun Solaris 'zones' residing on one physical server, however they may be considered as different physical machines.

2. Error description:

After a successful startup of management nodes, an attempt to start data nodes causes an error for the first node defined in config.ini during startup phase 5:

Time: Wednesday 9 May 2012 - 14:29:59
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: CREATE_TABLE_REF
Error object: NDBCNTR (Line: 2493) 0x00000002
Program: ndbd
Pid: 25848
Version: mysql-5.5.20 ndb-7.2.5
Trace: /opt/mysql-cluster/mysqlc/ndbdata/ndb_11_trace.log.2 [t1..t1]
***EOM***

All data nodes are shutdown subsequently (management console output):

ndb_mgm> Node 11: Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 12: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
Node 21: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.
Node 22: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

How to repeat:
1. Deploy MySQL cluster 7.2.5 on Sun Solaris 10, SPARC 64 bit, using the layout provided in the description / configuration files provided in the attached error report. As the directory structure used is custom, the my.cnf file used by the cluster is provided as the second attachment.

3. Start cluster management nodes 1 and 2 (was started both with and without the --initial option):

./ndb_mgmd --defaults-file=./my.cnf --initial

4. Start management client:

./ndb_mgm --defaults-file=./my.cnf

5. Start cluster data nodes 11, 12, 21, 22 (tried both with and without --initial option):

./ndbd --defaults-file=./my.cnf --initial

6. Tail the management node 1 log file (ndb_1_cluster.log). The data node 11 should report the error 2341 right after all data nodes complete phase 4. Also, the ndb_mgm should report forced data nodes shutdown.

After more trace parsing it turns out that the root cause behing the error 2341 was an error no. 771 in DBDICT (Given NODEGROUP doesn't exist in this cluster).

Node groups for data nodes were arbitrally set to 1 and 2. As it turns out, setting them to 0 and 1 respectively solves the issue and is a workaround for cluster startup.

Severity changed as workaround found, although the error message should contain the root cause of the problem, an arbitrary node group numbering within a specified range should be possible or at least there should be a hint in the documentation that node groups should be number starting with 0.

I got the similar error 

Node 2: Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_mgm> Node 3: Forced node shutdown completed. Occured during startphase 5. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

3 nodes, 154 is the mgmd, 248,249 are ndbd

how do you force the node groups to be numbered 0 and 1?

Thanks Frank for the work around. it helped.

hi Mark,

you can use the 'NodeGroup' variable and assign node group id.

example:

[ndbd]   
NodeGroup= 1
HostName=10.95.19.62

with regards,
ch Vishnu

not able to reproduce this on any modern mccge release.