Description:
Hello,
I had a node crash this morning. It had the following output on node:
2019-10-23 08:01:04 [ndbd] INFO -- dataLen: 87, op.maxRecordSize = 19, header: 65623
2019-10-23 08:01:04 [ndbd] INFO -- /export/home/pb2/build/sb_0-29707995-1532148374.74/rpm/BUILD/mysql-cluster-gpl-7.6.7/mysql-cluster-gpl-7.6.7/storage/ndb/src/kernel/blocks/backup/Backup.cpp
2019-10-23 08:01:04 [ndbd] INFO -- BACKUP (Line: 7928) 0x00000004 Check false failed
2019-10-23 08:01:04 [ndbd] INFO -- Error handler restarting system
2019-10-23 08:01:04 [ndbd] INFO -- Error handler shutdown completed - exiting
2019-10-23 08:01:05 [ndbd] ALERT -- Node 12: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2019-10-23 08:01:05 [ndbd] INFO -- Ndb has terminated (pid 48933) restarting
The 2 Data nodes were directly connected (lacp trunk) until yesterday but I moved them to a switch (HP) with lacp correctly set. So the faulty node was resynced yesterday.
The reason I changed from directly connected to switch is because I wish to add 2 mode data nodes this week. This was the first step of work.
The last log before the problem is the following:
2019-10-22 23:49:29 [ndbd] INFO -- Start phase 101 completed
2019-10-22 23:49:29 [ndbd] INFO -- Phase 101 was used by SUMA to take over responsibility for sending some of the asynchronous change events
2019-10-22 23:49:29 [ndbd] INFO -- Node started
2019-10-22 23:49:29 [ndbd] INFO -- GCP Monitor: Computed max GCP_COMMIT lag to 47 seconds
2019-10-22 23:49:29 [ndbd] INFO -- GCP Monitor: Computed max GCP_SAVE lag to 137 seconds
2019-10-23 01:24:46 [ndbd] INFO -- LCP Scan have stalled for 0 seconds, lastlast seen on line 13406, check_lcp_stop_count: 0
2019-10-23 01:32:26 [ndbd] INFO -- DROP_TAB_REQ: tab: 587, tabLcpStatus: 3
2019-10-23 01:32:26 [ndbd] INFO -- DROP_TAB_REQ: tab: 586, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 259, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 369, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 329, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 254, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 351, tabLcpStatus: 3
2019-10-23 01:32:50 [ndbd] INFO -- DROP_TAB_REQ: tab: 248, tabLcpStatus: 3
2019-10-23 02:05:17 [ndbd] INFO -- LCP Scan have stalled for 0 seconds, lastlast seen on line 13406, check_lcp_stop_count: 0
I hope anyone can help.
Best Regards
How to repeat:
I do not have an "how to repeat" procedure but my config is as follow:
[NDB_MGMD DEFAULT]
Portnumber=1186
[NDB_MGMD]
NodeId=21
HostName=172.XX.11.21
DataDir=/home/cluster/
Portnumber=1186
[TCP DEFAULT]
SendBufferMemory=4M
ReceiveBufferMemory=4M
[NDBD DEFAULT]
LogLevelStartup=15
LogLevelShutdown=15
LogLevelCheckpoint=8
LogLevelNodeRestart=15
DataMemory=80G
#IndexMemory=20G
MaxNoOfTables=4096
MaxNoOfTriggers=3500
NoOfReplicas=2
StringMemory=25
DiskPageBufferMemory=64M
SharedGlobalMemory=500M
LongMessageBuffer=32M
MaxNoOfConcurrentTransactions=16384
MaxNoOfConcurrentOperations=64000
BatchSizePerLocalScan=512
FragmentLogFileSize=256M
NoOfFragmentLogFiles=455
RedoBuffer=32M
MaxNoOfExecutionThreads=12
StopOnError=false
LockPagesInMainMemory=1
TimeBetweenEpochsTimeout=32000
TimeBetweenWatchdogCheckInitial=60000
TransactionInactiveTimeout=60000
HeartbeatIntervalDbDb=1500
HeartbeatIntervalDbApi=1500
MaxNoOfAttributes=5000
StartPartialTimeout=60
StartPartitionedTimeout=60
MaxNoOfOrderedIndexes=5000
MaxNoOfUniqueHashIndexes=128
TransactionDeadlockDetectionTimeout=2000
[NDBD]
NodeId=11
HostName=172.XX.11.11
DataDir=/home/cluster/
ServerPort=50501
[NDBD]
NodeId=12
HostName=172.XX.11.12
DataDir=/home/cluster/
ServerPort=50502
[MYSQLD DEFAULT]
[MYSQLD]
NodeId=31
HostName=172.XX.11.31
[MYSQLD]
NodeId=32
HostName=172.XX.11.32
[MYSQLD]
NodeId=33
HostName=172.XX.11.33
[MYSQLD]
NodeId=34
HostName=172.XX.11.34
# TCP/IP Connections
[tcp]
NodeId1=11
NodeId2=12
HostName1=10.10.10.1
HostName2=10.10.10.2