MySQL Bugs: #81510: Data Node crashes when attempting to restart with error 2341

Bug #81510	Data Node crashes when attempting to restart with error 2341
Submitted:	19 May 2016 15:03	Modified:	14 Nov 2016 22:44
Reporter:	Andrew Blackmore	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	7.4.11	OS:	Ubuntu (14.04)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
I have a MySQL cluster running utilizing 2 data nodes. I stopped one of the data nodes to perform some system upgrades and then when attempting to restart the data node it completes most of the process and then crashes when the data node is almost done restarting.

I have tried restarting a few times and even used the --initial. The data nodes are running ndbmtd. The errors on the data node produce this error log:

Time: Thursday 19 May 2016 - 09:08:50
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 19392) 0x00000002
Program: ndbmtd
Pid: 1853 thr: 8
Version: mysql-5.6.29 ndb-7.4.11
Trace: /usr/local/mysql/data/ndb_2_trace.log.7 [t1..t11]

How to repeat:
N/A

Line number indicates the following section of code:

void
Dbtc::executeFKChildTrigger(Signal* signal,
                            TcDefinedTriggerData* definedTriggerData,
                            TcFiredTriggerData* firedTriggerData,
                            ApiConnectRecordPtr* transPtr,
                            TcConnectRecordPtr* opPtr)
{
  Ptr<TcFKData> fkPtr;
  // TODO make it a pool.getPtr() instead
  // by also adding fk_ptr_i to definedTriggerData
  ndbrequire(c_fk_hash.find(fkPtr, definedTriggerData->fkId));  <<<<<<<< error <<<<<<

  switch (firedTriggerData->triggerEvent) {
  case(TriggerEvent::TE_INSERT):
    jam();
    /**
     * Check that after values exists in parent table
     */
    fk_readFromParentTable(signal, firedTriggerData, transPtr, opPtr, fkPtr.p);
    break;
  case(TriggerEvent::TE_UPDATE):
    jam();
    /**
     * Check that after values exists in parent table
     */
    fk_readFromParentTable(signal, firedTriggerData, transPtr, opPtr, fkPtr.p);
    break;
  default:
    ndbrequire(false);
  }
}

This appears to be a known bug as described below.

Documented fix as follows in the NDB 7.3.14, 7.4.12, 7.5.2 changelogs:
 
    During a node restart, re-creation of internal triggers used to
    verify the referential integrity of foreign keys was not
    reliable, due to the fact that not all distributed TC and LDM
    instances agreed on trigger identities. To fix this problem, an
    extra step is added to the node restart sequence, during which
    the trigger identities are determined from the current master
    node.

Hi,

I'm running mysql cluster: MySQL-Cluster-server-gpl-7.4.8-1.el7.x86_64

I have same error, but it happends 5 minutes after the node has started.
Please see the log below:

2016-09-09 14:14:29 [ndbd] INFO     -- Start phase 101 completed
2016-09-09 14:14:29 [ndbd] INFO     -- Phase 101 was used by SUMA to take over responsibility for sending some of the asynchronous ch
2016-09-09 14:14:29 [ndbd] INFO     -- Node started
2016-09-09 14:20:15 [ndbd] INFO     -- /export/home2/pb2/build/sb_0-16730888-1444652131.27/rpm/BUILD/mysql-cluster-gpl-7.4.8/mysql-cluster-gpl-7.4.8/storage/ndb/src/kernel/blocks/dbtc/DbtcMain.cpp
2016-09-09 14:20:15 [ndbd] INFO     -- DBTC (Line: 19292) 0x00000002
2016-09-09 14:20:15 [ndbd] INFO     -- Error handler shutting down system
2016-09-09 14:20:15 [ndbd] INFO     -- Error handler shutdown completed - exiting
2016-09-09 14:20:15 [ndbd] DEBUG    -- Angel got child 45083
2016-09-09 14:20:15 [ndbd] DEBUG    -- error: 2341, signal: 0, sphase: 255
2016-09-09 14:20:15 [ndbd] ALERT    -- Node 4: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Filip,

I would recommend upgrading to the newest version of the cluster. As Jonathon stated the issue that I encountered in 7.4.11 was fixed in 7.4.12.

Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/