MySQL Bugs: #66295: mysql cluster environment (mysql-5.1.56 ndb-7.1.15) date node often shutdown and

Bug #66295	mysql cluster environment (mysql-5.1.56 ndb-7.1.15) date node often shutdown and
Submitted:	10 Aug 2012 2:41	Modified:	14 Aug 2019 19:23
Reporter:	Ananth Narayanamoorthy	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: NDB API	Severity:	S1 (Critical)
Version:	mysql-5.1.56 ndb-7.1.15	OS:	Linux (2.6.16.60-0.21-smp - x86_64 x86_64 x86_64 GNU/Linux)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
Hi

We are having issue with mysql cluster environment, we see the data node often shutdown and crash and the following error is seen in the ndb_out.logs

2012-08-09 18:41:34 [ndbd] INFO     -- Received signal 11. Running error handler.
2012-08-09 18:41:34 [ndbd] INFO     -- Signal 11 received; Segmentation fault
2012-08-09 18:41:34 [ndbd] INFO     -- ndbd.cpp
2012-08-09 18:41:34 [ndbd] INFO     -- Error handler signal restarting system
2012-08-09 18:41:34 [ndbd] INFO     -- Error handler shutdown completed - exiting
2012-08-09 18:41:34 [ndbd] ALERT    -- Node 2: Forced node shutdown completed, restarting. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2012-08-09 18:41:34 [ndbd] INFO     -- Ndb has terminated (pid 21995) restarting
2012-08-09 18:41:34 [ndbd] INFO     -- Angel reconnected to 'x.x.1.20:1186'

We see similar error at Node 3

2012-08-09 18:41:23 [ndbd] INFO     -- dbtup/DbtupRoutines.cpp
2012-08-09 18:41:23 [ndbd] INFO     -- DBTUP (Line: 669) 0x00000000
2012-08-09 18:41:23 [ndbd] INFO     -- Error handler restarting system
2012-08-09 18:41:23 [ndbd] INFO     -- Error handler shutdown completed - exiting
2012-08-09 18:41:23 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2012-08-09 18:41:23 [ndbd] INFO     -- Ndb has terminated (pid 16939) restarting
2012-08-09 18:41:23 [ndbd] INFO     -- Angel reconnected to 'x.x.1.20:1186'
2012-08-09 18:41:35 [ndbd] INFO     -- Angel reallocated nodeid: 3
2012-08-09 18:41:35 [ndbd] INFO     -- Angel pid: 16938 started child: 28194
2012-08-09 18:41:35 [ndbd] INFO     -- Configuration fetched from 'x.x.1.20:1186', generation: 1
NDBMT: non-mt
2012-08-09 18:41:35 [ndbd] INFO     -- NDB Cluster -- DB node 3
2012-08-09 18:41:35 [ndbd] INFO     -- mysql-5.1.56 ndb-7.1.15 --

In management console we see 

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2 (not connected, accepting connect from x.x.2.11)
id=3 (not connected, accepting connect from x.x.2.12)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @x.x.1.20  (mysql-5.1.56 ndb-7.1.15)

[mysqld(API)]   2 node(s)
id=4 (not connected, accepting connect from x.x.2.11)
id=5 (not connected, accepting connect from x.x.2.12)

Our configuration in config.ini is 

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=2G
IndexMemory=200M
UndoIndexBuffer=64M
RedoBuffer=256M
TimeBetweenLocalCheckpoints=6
NoOfFragmentLogFiles=256
FragmentLogFileSize=16M

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]

[TCP DEFAULT]

# Section for the cluster management node
[NDB_MGMD]
# IP address of the management node (CI server - this one)
HostName=x.x.1.20
NodeId=1

# Section for the storage nodes
[NDBD]
# IP address of the first data node (APP01 server holding ndb1)
HostName=x.x.2.11
DataDir=/usr/local/mysqlc/cluster/ndb_data
NodeId=2
TransactionDeadlockDetectionTimeout=10000
StopOnError =false

[NDBD]
# IP address of the second storage node (APP02 server holding ndb2)
HostName=x.x.2.12
DataDir=/usr/local/mysqlc/cluster/ndb_data
NodeId=3
TransactionDeadlockDetectionTimeout=10000
StopOnError =false

# one [MYSQLD] per storage node
[MYSQLD]
HostName=x.x.2.11
NodeId=4
[MYSQLD]
HostName=x.x.2.12
NodeId=5

Please let me know if you need more information

Thanks and Regards
Ananth

How to repeat:
It get's repeated when we try to insert data

Suggested fix:
We have to stop the data node and restart the node with initial command.

And redump the DB

Can any one please provide feedback on this bug.

Thanks 
Ananth

Can anyone please provide feedback on this bug.

This is affecting our production

Regards
Ananth

I had the same (or very similar) issue occur in my lab today.  Restarting the failed ndb node did not work.  I had to restart the ndb node with 'initial' and wait for it to sync before it would join the cluster.  The ndb_4_error.log did not offer any additional error messages.

Thank you for your bug report. This issue has already been fixed in the latest released version of that product, which you can download at

  http://www.mysql.com/downloads/