MySQL Bugs: #80867: MySQL Cluster Crashed

Bug #80867	MySQL Cluster Crashed
Submitted:	28 Mar 2016 12:17	Modified:	4 May 2016 5:01
Reporter:	Serhat Demircan	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S1 (Critical)
Version:	7.4.10	OS:	Debian (Wheezy)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	ndbcluster

Description:
2 nodes in same node group stopped with following error than whole cluster went down. 

2016-03-28 14:30:17 [ndbd] INFO     -- Node 12 has completed node fail handling
2016-03-28 14:32:05 [ndbd] INFO     -- /export/home/pb2/build/sb_0-17731890-1453887053.27/mysql-cluster-gpl-7.4.10/storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
2016-03-28 14:32:05 [ndbd] INFO     -- DBTC (Line: 19385) 0x00000002
2016-03-28 14:32:05 [ndbd] INFO     -- Error handler shutting down system
2016-03-28 14:32:05 [ndbd] INFO     -- Error handler shutdown completed - exiting
2016-03-28 14:32:08 [ndbd] ALERT    -- Node 5: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Time: Monday 28 March 2016 - 14:32:05
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 19385) 0x00000002
Program: ndbmtd
Pid: 3014 thr: 10
Version: mysql-5.6.28 ndb-7.4.10
Trace: /ndbdata/ndb_5_trace.log.9 [t1..t15]
***EOM***

How to repeat:
Can't repeat.

- Stop data node X
- Start data node X

After data node X reached started state this error occured. But this does not happen all times. Most of time i can restart all nodes for maintenance without a problem.

Hi Serhat,

You said you can't reproduce and then you said "it does not happen every time" so can you please clarify did you see this problem only once or you see this problem "often but not always" when you start your cluster?

Can you, please, upload the whole ndb_error_log result with all the logs as just traces don't show the whole picture

kind regards
Bogdan Kecman

Hi Bogdan,

Not only once. I see this problem often but not always when restarting data nodes. For example It happened again after reported this one. 

Uploaded ndb_out.log, ndb_error.log and ndb_trace.log.

Hi Serhat,

would be better if you uploaded the whole ndb_error_reporter collection of logs and not only the single crash but anyhow looking at the logs you provided I'd say that this is not a bug but a missconfigured system. It looks like your system is overloaded and crashing. What is your cpu usage on all datanodes when you encounter this problem? Also what is your IO usage on all datanodes when you encounter this problem?

I cannot reproduce crash in the place you have it in the log you provided but you can see watchdog complaining 

kind regards
Bogdan