MySQL Bugs: #21743: Error 6050: WatchDog terminate, internal error or massive overload on the mac...

Bug #21743	Error 6050: WatchDog terminate, internal error or massive overload on the mac...
Submitted:	21 Aug 2006 0:16	Modified:	21 Sep 2006 7:43
Reporter:	David Rusenko	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	5.0.18	OS:	Linux (SuSE 9.1)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	cluster, crash

Description:
Cluster is comprised of 2 data nodes, 1 management node, and 2 client nodes. The
set up is a 3 machine cluster -- two machines with 1 data node and 1 client node
each, and one machine running the management node. Each data node has 1GB of memory, sufficient disk space, and a 2.4Ghz P4 CPU. At 22:10 on various days over the next month and a half, one NDB data node would crash with the message:

Time: Tuesday 25 July 2006 - 22:10:22
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 22673
Trace: /usr/local/mysql/data/ndb_3_trace.log.3
Version: Version 5.0.18
***EOM***

The cron.daily operations can be ruled out, as they took place at 4:15am. The single operation taking place at that time was:

10 22 * * * /usr/local/bin/simplebackup /usr/local/etc/simplebackup.conf

Which in turn called a shell script which called "mysqldump" on 7 databases in a row (none of these are particularly large). At times, the server load was high (10) when the process crashed, and at times, the server load was very very low (0.1).

Additional log information:

Time: Thursday 27 July 2006 - 22:10:21
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 14186
Trace: /usr/local/mysql/data/ndb_3_trace.log.4
Version: Version 5.0.18
***EOM***

Time: Saturday 5 August 2006 - 22:10:25
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 12316
Trace: /usr/local/mysql/data/ndb_3_trace.log.5
Version: Version 5.0.18
***EOM***

Time: Thursday 10 August 2006 - 22:10:21
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 12447
Trace: /usr/local/mysql/data/ndb_3_trace.log.6
Version: Version 5.0.18
***EOM***

Time: x 13 August 2006 - 22:10:21
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 14157
Trace: /usr/local/mysql/data/ndb_3_trace.log.7
Version: Version 5.0.18
***EOM***

Time: Wednesday 16 August 2006 - 22:10:25
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 7101
Trace: /usr/local/mysql/data/ndb_3_trace.log.8
Version: Version 5.0.18
***EOM***

Time: Thursday 17 August 2006 - 08:44:49
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error, programming error or missing error message, please report a bug)
Error: 6050
Error data: Job Handling
Error object: WatchDog.cpp
Program: ndbd
Pid: 24768
Trace: /usr/local/mysql/data/ndb_3_trace.log.9
Version: Version 5.0.18
***EOM***

The last occurence is also puzzling, as it happened during a time out of character for the series (8:45am vs 10:22pm).

I will also attache the trace logs. Please let me know if there is any other information which may be helpful.

Thanks,

-David

How to repeat:
- Let cluster run -- eventually, one node crashes with "internal error or massive overload".
- May be caused by performing 7 consecutive "mysqldump" commands on small databases (<10MB).

Uploaded the trace files:

ftp.mysql.com/pub/mysql/upload/ndb_3_trace.log-21743.tar.gz

Hi,

When you refer to client-nodes, you mean mysqld's, right?

I guess that this is a memory/swapping problem, 
  i.e you mysqldump-script causes mysqld to allocate
  lots of memory, causing ndbd to be swapped out.
And ndbd being swapped out, is a almost certain way of causing WatchDog to "fire".

Could this be correct?
Can you also upload your config.ini

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Dear genius sir Jonas, 

   Today , I also meet with  the  error 6050 as below, and your wanted config.ini is at below, I am using ndb7.5.7 in windows server 2012:

--config.ini--
[NDBD DEFAULT]
#1: 表示只有一份数据，但是分成n块分别存储在n个数据节点上。2:数据被分成n/2块，每块数据都有2个备份，这样即使有任意一个节点发生故障，只要它的备份节点正常，系统就可以正常运行
NoOfReplicas=2
#以下参数都可再优化
DataDir=D:\Optimized\ndbData
DataMemory=8000M
IndexMemory=1000M
#BackupMemory: 64M

#ljx新增 
LockPagesInMainMemory=0
#LockExecuteThreadToCPU=0
#LockMaintThreadsToCPU=1
RealtimeScheduler=1
SchedulerExecutionTimer=10
SchedulerSpinTimer=100
#CompressedLCP=1
#CompressedBackup=1
#Enabling CompressedLCP and CompressedBackup causes, respectively, local

# Transaction Parameters #
#old1 MaxNoOfConcurrentTransactions: 8518
MaxNoOfConcurrentTransactions: 151798
#old1 MaxNoOfConcurrentOperations: 1000000
MaxNoOfConcurrentOperations: 1517980
#old1 MaxNoOfLocalOperations: 110000
MaxNoOfLocalOperations: 1517980

MaxNoOfTables = 1024
MaxNoOfAttributes = 100000
MaxNoOfOrderedIndexes = 10000

[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
#ljx新增
#SendBufferMemory=2M
#ReceiveBufferMemory=2M

[NDB_MGMD]
Nodeid=32
#管理节点服务器
HostName=192.168.70.15
PortNumber=8518
# Storage Engines
DataDir=D:\Optimized\mgmdata

[NDBD]
Nodeid=33
#MySQL集群db1的IP地址
HostName=192.168.70.16

[NDBD]
Nodeid=34
#MySQL集群db2的IP地址
HostName=192.168.70.17

[MYSQLD]
Nodeid=35
HostName=192.168.70.16

[MYSQLD]
Nodeid=36
HostName=192.168.70.17
[MYSQLD]

--below is the error log:--
Current byte-offset of file-pointer is: 1083                      

Time: Monday 19 November 2018 - 11:53:40
Status: Temporary error, restart node
Message: WatchDog terminate, internal error or massive overload on the machine running this node (Internal error,
 programming error or missing error message, please report a bug)
Error: 6050
Error data: Allocating memory
Error object: g:\ade\build\sb_0-23963488-1498231248.21\mysql-cluster-gpl-7.5.7\storage\ndb\src\kernel\vm\watchdog.cpp
Program: ndbd
Pid: 3296
Version: mysql-5.7.19 ndb-7.5.7
Trace file name: ndb_34_trace.log.8
Trace file path: D:\Optimized\ndbData\ndb_34_trace.log.8 [t1..t1]
***EOM***