Bug #69201 ndbmtd fails to start with Error 2341 after ThreadConfig changed
Submitted: 10 May 2013 20:57 Modified: 19 May 2016 12:06
Reporter: Justin Ryan Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.2.12 OS:Linux (CentOS 6.4 x64)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: 2341, ndbrequire, threadconfig

[10 May 2013 20:57] Justin Ryan
Description:
Starting ndbmtd with --initial and new ThreadConfig and NoOfFragmentLogParts settings fails with:

Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

I have tried the following workarounds:

 - upgraded whole cluster 7.2.10 to 7.2.12
 - restarting different data nodes
 - started ndbmtd as root instead of mysql user
 - started ndbmtd with and without `numactl --interleave=all`
 - stopped data node through ndb_mgm, then restarted from data node
 - shutdown cluster fully, restarted with original config, change config and attempt rolling restart
 - have NOT tried full shutdown, initial restart and restore from backup

How to repeat:
1. Cluster running with MaxNoOfExecutionThreads = 8 and default NoOfFragmentLogParts (4)

2. Edit config.ini with the following changes, restart mgmd services:

-MaxNoOfExecutionThreads = 8
+ThreadConfig = ldm={count=12,cpubind=0,1,2,8,9,10,16,17,18,24,25,26},tc={count=7,cpubind=3,4,11,12,19,20,21},send={count=3,cpubind=5,13,22},recv={count=3,cpubind=6,14,23},main={cpubind=27},io={cpubind=27},rep={cpubind=28}
+NoOfFragmentLogParts = 12

3. restart one data node with `14 restart -i`

2013-05-10 16:24:05 [ndbd] INFO     -- /pb2/build/sb_0-8660699-1363118778.75/rpm/BUILD/mysql-cluster-gpl-7.2.12/mysql-cluster-gpl-7.2.12/storage/ndb/src/kernel/blocks/dblqh/DblqhMain.cpp
2013-05-10 16:24:05 [ndbd] INFO     -- DBLQH (Line: 8960) 0x00000006
2013-05-10 16:24:05 [ndbd] INFO     -- Error handler shutting down system
2013-05-10 16:24:05 [ndbd] INFO     -- Error handler shutdown completed - exiting
2013-05-10 16:24:19 [ndbd] ALERT    -- Node 14: Forced node shutdown completed. Occured during startphase 5. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Suggested fix:
Unknown
[10 May 2013 21:00] Justin Ryan
error, out, mgmd, and trace logs.

Attachment: ndb_logs-69201.tgz (application/octet-stream, text), 869.61 KiB.

[10 May 2013 21:02] Justin Ryan
attachment also contains config.ini
[11 May 2013 13:53] Justin Ryan
DblqhMain.cpp:8960

  ndbrequire(tcPtr->activeCreat == Fragrecord::AC_NORMAL);
[19 May 2016 12:06] MySQL Verification Team
reproduced with provided config on 7.2.12, can't reproduce on up to date versions of mysql cluster