Description:
Hi
We are having issue with mysql cluster environment, we see the data node often shutdown and crash and the following error is seen in the ndb_out.logs
2012-08-09 18:41:34 [ndbd] INFO -- Received signal 11. Running error handler.
2012-08-09 18:41:34 [ndbd] INFO -- Signal 11 received; Segmentation fault
2012-08-09 18:41:34 [ndbd] INFO -- ndbd.cpp
2012-08-09 18:41:34 [ndbd] INFO -- Error handler signal restarting system
2012-08-09 18:41:34 [ndbd] INFO -- Error handler shutdown completed - exiting
2012-08-09 18:41:34 [ndbd] ALERT -- Node 2: Forced node shutdown completed, restarting. Initiated by signal 11. Caused by error 6000: 'Error OS signal received(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2012-08-09 18:41:34 [ndbd] INFO -- Ndb has terminated (pid 21995) restarting
2012-08-09 18:41:34 [ndbd] INFO -- Angel reconnected to 'x.x.1.20:1186'
We see similar error at Node 3
2012-08-09 18:41:23 [ndbd] INFO -- dbtup/DbtupRoutines.cpp
2012-08-09 18:41:23 [ndbd] INFO -- DBTUP (Line: 669) 0x00000000
2012-08-09 18:41:23 [ndbd] INFO -- Error handler restarting system
2012-08-09 18:41:23 [ndbd] INFO -- Error handler shutdown completed - exiting
2012-08-09 18:41:23 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2012-08-09 18:41:23 [ndbd] INFO -- Ndb has terminated (pid 16939) restarting
2012-08-09 18:41:23 [ndbd] INFO -- Angel reconnected to 'x.x.1.20:1186'
2012-08-09 18:41:35 [ndbd] INFO -- Angel reallocated nodeid: 3
2012-08-09 18:41:35 [ndbd] INFO -- Angel pid: 16938 started child: 28194
2012-08-09 18:41:35 [ndbd] INFO -- Configuration fetched from 'x.x.1.20:1186', generation: 1
NDBMT: non-mt
2012-08-09 18:41:35 [ndbd] INFO -- NDB Cluster -- DB node 3
2012-08-09 18:41:35 [ndbd] INFO -- mysql-5.1.56 ndb-7.1.15 --
In management console we see
ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 (not connected, accepting connect from x.x.2.11)
id=3 (not connected, accepting connect from x.x.2.12)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @x.x.1.20 (mysql-5.1.56 ndb-7.1.15)
[mysqld(API)] 2 node(s)
id=4 (not connected, accepting connect from x.x.2.11)
id=5 (not connected, accepting connect from x.x.2.12)
Our configuration in config.ini is
[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=2G
IndexMemory=200M
UndoIndexBuffer=64M
RedoBuffer=256M
TimeBetweenLocalCheckpoints=6
NoOfFragmentLogFiles=256
FragmentLogFileSize=16M
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Section for the cluster management node
[NDB_MGMD]
# IP address of the management node (CI server - this one)
HostName=x.x.1.20
NodeId=1
# Section for the storage nodes
[NDBD]
# IP address of the first data node (APP01 server holding ndb1)
HostName=x.x.2.11
DataDir=/usr/local/mysqlc/cluster/ndb_data
NodeId=2
TransactionDeadlockDetectionTimeout=10000
StopOnError =false
[NDBD]
# IP address of the second storage node (APP02 server holding ndb2)
HostName=x.x.2.12
DataDir=/usr/local/mysqlc/cluster/ndb_data
NodeId=3
TransactionDeadlockDetectionTimeout=10000
StopOnError =false
# one [MYSQLD] per storage node
[MYSQLD]
HostName=x.x.2.11
NodeId=4
[MYSQLD]
HostName=x.x.2.12
NodeId=5
Please let me know if you need more information
Thanks and Regards
Ananth
How to repeat:
It get's repeated when we try to insert data
Suggested fix:
We have to stop the data node and restart the node with initial command.
And redump the DB