Bug #67349 ndb shutdown problem. cause "LCP Frag watchdog .."
Submitted: 24 Oct 2012 3:45 Modified: 22 Feb 2013 15:36
Reporter: Geunho Jang Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:MySQL Cluster 7.2.7 OS:Linux (ubuntu 12.04)
Assigned to: Assigned Account CPU Architecture:Any
Tags: LCP Frag watchdog;Event buffer status;

[24 Oct 2012 3:45] Geunho Jang
Description:
every ndb nodes shutdown at a moment. 
I don't know the reason of shutdown. 
So many info messages are displayed in the log file like this line

2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/0
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/1
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/2
2012-10-24 01:32:14 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/3
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/4
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/5
2012-10-24 01:32:15 [MgmtSrvr] INFO     -- Node 30: Event buffer status: used=288B(0%) alloc=445KB(0%) max=0B apply_epoch=265667/0 latest_epoch=265674/6

and appears the syntax major problem that I thought

2012-10-24 03:36:50 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 20 s.  839488 rows completed
2012-10-24 03:36:57 [MgmtSrvr] WARNING  -- Node 8: LCP Frag watchdog : No progress on table 68, frag 5 for 20 s.  839504 rows completed
2012-10-24 03:37:00 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 30 s.  839488 rows completed
2012-10-24 03:37:00 [MgmtSrvr] WARNING  -- Node 7: LCP Frag watchdog : No progress on table 68, frag 5 for 20 s.  839504 rows completed

I use 2 mgm nodes, 4 ndb nodes and 3 api nodes
I attached a log file on the second mgm server.

I setted up config.ini like follow

[NDBD DEFAULT]
NoOfReplicas: 2
DataDir: /mnt/data
BackupDataDir: /mnt
FileSystemPath: /mnt/mysql-cluster

# Data Memory, Index Memory, and String Memory #
DataMemory: 9000M
IndexMemory: 2048M
StringMemory: 10

# Transaction Parameters #
MaxNoOfConcurrentTransactions: 10000
MaxNoOfConcurrentOperations: 170000
MaxNoOfLocalOperations: 190000

# Transaction Temporary Storage #
MaxNoOfConcurrentIndexOperations: 10000
MaxNoOfFiredTriggers: 4000
TransactionBufferMemory: 160M

# Scans and buffering #
MaxNoOfConcurrentScans: 300
MaxNoOfLocalScans: 10000
BatchSizePerLocalScan: 256
LongMessageBuffer: 128M
MaxParallelScansPerFragment: 1024

# Memory Allocation
MaxAllocate: 1G

# Logging and Checkpointing #
NoOfFragmentLogFiles: 300
FragmentLogFileSize: 16M
MaxLCPStartDelay: 20

# Metadata Objects #
MaxNoOfAttributes: 1500
MaxNoOfTables: 200
MaxNoOfOrderedIndexes: 1000
MaxNoOfUniqueHashIndexes: 1000
MaxNoOfTriggers: 770

# Boolean Parameters #
CompressedLCP: true

# Controlling Timeouts, Intervals, and Disk Paging #
TimeBetweenWatchDogCheck: 6000
TimeBetweenWatchDogCheckInitial: 6000
StartPartialTimeout: 30000
StartPartitionedTimeout: 60000
StartFailureTimeout: 1000000
HeartbeatIntervalDbDb: 6000
HeartbeatIntervalDbApi: 5000
TimeBetweenLocalCheckpoints: 24

# Buffering and Logging #
UndoIndexBuffer: 80M
UndoDataBuffer: 160M
RedoBuffer: 256M
LogLevelStartup: 10
LogLevelShutdown: 3
LogLevelStatistic: 0
LogLevelCheckpoint: 0
LogLevelNodeRestart: 0
LogLevelConnection: 0
LogLevelError: 15
LogLevelCongestion: 0
LogLevelInfo: 1
MemReportFrequency: 0

# Backup Parameters #
BackupDataBufferSize: 64M
BackupLogBufferSize: 64M
BackupMemory: 128M
BackupWriteSize: 80M
BackupMaxWriteSize: 160M

# MySQL Cluster Realtime Perfomance Parameters
MaxNoOfExecutionThreads: 8
NoOfFragmentLogparts: 8
DiskPageBufferMemory: 256M
SharedGlobalMemory: 128M
ExtraSendBufferMemory: 160M
TotalSendBufferMemory: 256M

[MGM DEFAULT]
PortNumber: *****
DataDir: /dir
TotalSendBufferMemory: 1024M

[TCP DEFAULT]
PortNumber: *****
SendBufferMemory: 160M
ReceiveBufferMemory: 160M

How to repeat:
Warning message appears and shutdown all ndb nodes

2012-10-24 03:36:50 [MgmtSrvr] WARNING  -- Node 5: LCP Frag watchdog : No progress on table 68, frag 0 for 20 s.  839488 rows completed
[24 Oct 2012 4:05] Geunho Jang
ndb_2_cluster log file

Attachment: ndb_2_cluster.log (application/octet-stream, text), 39.21 KiB.

[4 Jan 2013 12:55] MySQL Verification Team
Reported in internal Oracle bug # are 14495364, 14075825

Also, this behavior is well documented in the manual page:

see, http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-news-5-5-22-ndb-7-2-6.html

http://dev.mysql.com/doc/ndbapi/en/ndbd-error-codes-lqh.html
[22 Jan 2013 15:35] Shahryar Ghazi
There was a bug in the LCP Frag scan watchdog, which has been fixed already. The fix was pushed to 7.0.37, 7.1.26 and 7.2.10.

Please upgrade your cluster and let us know if you still see the same issue.

Thanks.
[23 Feb 2013 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".