Bug #71008 MySql cluster data node stopped and does not start
Submitted: 26 Nov 2013 10:44 Modified: 22 Mar 2016 12:40
Reporter: Janick Bernet Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.3.5 OS:Linux (Ubuntu 12.04)
Assigned to: MySQL Verification Team CPU Architecture:Any

[26 Nov 2013 10:44] Janick Bernet
Description:
It regularely happens that one of the data nodes stoppes and is not able to properly re-integrate in a simple 2 data node cluster (2 mysql nodes, 1 mgmt node). The log shows the following:

Time: Monday 25 November 2013 - 20:09:08
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtupRoutines.cpp
Error object: DBTUP (Line: 728) 0x00000002
Program: ndbmtd
Pid: 1930 thr: 2
Version: mysql-5.6.11 ndb-7.3.2
Trace: /data/mysqlcluster//ndb_4_trace.log.1 [t1..t4]
***EOM***

Despite the claim for a temporary error, no startup is possible except by wiping the node using --initial.

How to repeat:
No idea the circumstances, but it happens regularely.
[26 Nov 2013 11:06] Janick Bernet
Ubuntu version is 12.04 LTS.
[29 Jan 2014 8:23] Janick Bernet
For those who might not see it: there is a private NDB error report attached.
[20 May 2014 10:16] Janick Bernet
Issue still persists after upgrading to cluster 7.3.5.
[11 Jun 2014 4:17] Kevin Dyer
Check your data node memory usage. In practice this is significantly greater than DataMemory + IndexMemory. A normal restart seems to use more than an initial.

Check the following Ndbd_mem_manager message although it too appears to be understated. Expect to allow a minimum of 1GB additional memory for the OS and ndbd usage over that "initial" value. 

grep Ndbd_mem_manager /usr/local/mysql/data/ndb_1_out.log
2014-06-10 21:49:10 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 6892Mb initial: 7020Mb

Use the following to disable swap. It may force you to increase RAM/decrease DataMemory to avoid a signal 9 by the oom-killer.

LockPagesInMainMemory=1
[22 Mar 2016 12:40] MySQL Verification Team
Hi,
sorry for the very late reply but it looks like your cluster is just poorly configured for your hardware. If you are still having the problem let us know, best contact MySQL Cluster support for proper configuration of your system but if you have new crash logs you can upload them here.

kind regards
Bogdan Kecman