MySQL Bugs: #67349: ndb shutdown problem. cause "LCP Frag watchdog .."

Bug #67349	ndb shutdown problem. cause "LCP Frag watchdog .."
Submitted:	24 Oct 2012 3:45	Modified:	22 Feb 2013 15:36
Reporter:	Geunho Jang	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	MySQL Cluster 7.2.7	OS:	Linux (ubuntu 12.04)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	LCP Frag watchdog;Event buffer status;

Description:
every ndb nodes shutdown at a moment. 
I don't know the reason of shutdown. 
So many info messages are displayed in the log file like this line

2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/0
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/1
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/2
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/3
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/4
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/5
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/6

and appears the syntax major problem that I thought

2012-10-24 03:36:50 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 20 s.  839488 rows completed
2012-10-24 03:36:57 [MgmtSrvr] WARNING  -- Node 8: LCP Frag watchdog : No progress on table 68, frag 5 for 20 s.  839504 rows completed
2012-10-24 03:37:00 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 30 s.  839488 rows completed
2012-10-24 03:37:00 [MgmtSrvr] WARNING  -- Node 7: LCP Frag watchdog : No progress on table 68, frag 5 for 20 s.  839504 rows completed

I use 2 mgm nodes, 4 ndb nodes and 3 api nodes
I attached a log file on the second mgm server.

I setted up config.ini like follow

[NDBD DEFAULT]
NoOfReplicas: 2
DataDir: /mnt/data
BackupDataDir: /mnt
FileSystemPath: /mnt/mysql-cluster

# Data Memory, Index Memory, and String Memory #
DataMemory: 9000M
IndexMemory: 2048M
StringMemory: 10

# Transaction Parameters #
MaxNoOfConcurrentTransactions: 10000
MaxNoOfConcurrentOperations: 170000
MaxNoOfLocalOperations: 190000

# Transaction Temporary Storage #
MaxNoOfConcurrentIndexOperations: 10000
MaxNoOfFiredTriggers: 4000
TransactionBufferMemory: 160M

# Scans and buffering #
MaxNoOfConcurrentScans: 300
MaxNoOfLocalScans: 10000
BatchSizePerLocalScan: 256
LongMessageBuffer: 128M
MaxParallelScansPerFragment: 1024

# Memory Allocation
MaxAllocate: 1G

# Logging and Checkpointing #
NoOfFragmentLogFiles: 300
FragmentLogFileSize: 16M
MaxLCPStartDelay: 20

# Metadata Objects #
MaxNoOfAttributes: 1500
MaxNoOfTables: 200
MaxNoOfOrderedIndexes: 1000
MaxNoOfUniqueHashIndexes: 1000
MaxNoOfTriggers: 770

# Boolean Parameters #
CompressedLCP: true

# Controlling Timeouts, Intervals, and Disk Paging #
TimeBetweenWatchDogCheck: 6000
TimeBetweenWatchDogCheckInitial: 6000
StartPartialTimeout: 30000
StartPartitionedTimeout: 60000
StartFailureTimeout: 1000000
HeartbeatIntervalDbDb: 6000
HeartbeatIntervalDbApi: 5000
TimeBetweenLocalCheckpoints: 24

# Buffering and Logging #
UndoIndexBuffer: 80M
UndoDataBuffer: 160M
RedoBuffer: 256M
LogLevelStartup: 10
LogLevelShutdown: 3
LogLevelStatistic: 0
LogLevelCheckpoint: 0
LogLevelNodeRestart: 0
LogLevelConnection: 0
LogLevelError: 15
LogLevelCongestion: 0
LogLevelInfo: 1
MemReportFrequency: 0

# Backup Parameters #
BackupDataBufferSize: 64M
BackupLogBufferSize: 64M
BackupMemory: 128M
BackupWriteSize: 80M
BackupMaxWriteSize: 160M

# MySQL Cluster Realtime Perfomance Parameters
MaxNoOfExecutionThreads: 8
NoOfFragmentLogparts: 8
DiskPageBufferMemory: 256M
SharedGlobalMemory: 128M
ExtraSendBufferMemory: 160M
TotalSendBufferMemory: 256M

[MGM DEFAULT]
PortNumber: *****
DataDir: /dir
TotalSendBufferMemory: 1024M

[TCP DEFAULT]
PortNumber: *****
SendBufferMemory: 160M
ReceiveBufferMemory: 160M

How to repeat:
Warning message appears and shutdown all ndb nodes

2012-10-24 03:36:50 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 20 s.  839488 rows completed

ndb_2_cluster log file

Attachment: ndb_2_cluster.log (application/octet-stream, text), 39.21 KiB.

Reported in internal Oracle bug # are 14495364, 14075825

Also, this behavior is well documented in the manual page:

see, http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-news-5-5-22-ndb-7-2-6.html

http://dev.mysql.com/doc/ndbapi/en/ndbd-error-codes-lqh.html

There was a bug in the LCP Frag scan watchdog, which has been fixed already. The fix was pushed to 7.0.37, 7.1.26 and 7.2.10.

Please upgrade your cluster and let us know if you still see the same issue.

Thanks.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".