MySQL Bugs: #32955: Node repeatedly crashes with error 6050

Bug #32955	Node repeatedly crashes with error 6050
Submitted:	4 Dec 2007 12:10	Modified:	13 Mar 2009 9:54
Reporter:	James Graham	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	5.0.45	OS:	Linux (Debian etch)
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:
Hi

Since starting node 2 with --initial, the node will shortly crash. Once restarted, approximately an hour later (including 25-35 minutes to start) the node will again crash with the same error (detailed below).

Node 2: Forced node shutdown completed, restarting. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. - Unknown error code: Unknown result: Unknown error code

ndb show:

[ndbd(NDB)]	2 node(s)
id=2	@89.200.x.x  (Version: 5.0.45, Nodegroup: 0)
id=3	@89.200.x.x  (Version: 5.0.45, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@89.200.x.x  (Version: 5.0.45)

[mysqld(API)]	3 node(s)
id=4	@89.200.x.x  (Version: 5.0.45)
id=5	@89.200.x.x  (Version: 5.0.45)
id=6	@89.200.x.x  (Version: 5.0.45)

We have checked the hard disk for integrity. We have checked the ram for ECC errors. Both are fine. The node will crash regardless of traffic levels.

Each server is specced with 4GB ram, dual 2.13ghz xeons.
Load averages: 3.39 (1 min) 3.55 (5 mins) 3.04 (15 mins)
RAM utilization is about 50% across the board.

Any help appreciated. Thanks

How to repeat:
Start the ndbd, wait an hour or so...

Suggested fix:
No idea

To look into this any further we need to know:

- CPU usage of the ndbd process when it runs into this
- the cluster configuration file
- the general cluster log and the error log and trace logs from this node
- information on the amount of objects (tables, indexes, ...) and data in the cluster and whether TEXT or BLOB columns are used would help, too

ndbd hovers around 60% but can range from 40% - 80%.

config.ini:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=2048MB
IndexMemory=512MB
MaxNoOfAttributes=10000
MaxNoOfConcurrentOperations=131072
MaxNoOfOrderedIndexes=1024
MaxNoOfTables=256
TimeBetweenLocalCheckpoints=12
NoOfFragmentLogFiles=64
#MaxNoOfUniqueHashIndexes=768
TransactionDeadlockDetectionTimeout=12000
MaxNoOfConcurrentTransactions=20000

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]

[TCP DEFAULT]

[NDB_MGMD]
Id=1                            # the management server (this one)
HostName=89.200.x.x          

[NDBD]
Id=2                            # the first storage node
HostName=89.200.x.x
DataDir= /var/lib/mysql-cluster
StopOnError=false
MaxNoOfConcurrentOperations=100000

[NDBD]
Id=3                            # the second storage node
HostName=89.200.x.x
DataDir=/var/lib/mysql-cluster
StopOnError=false
MaxNoOfConcurrentOperations=100000

#[NDBD]
#Id=7                            # the third storage node 
#HostName=89.200.x.x
#DataDir=/var/lib/mysql-cluster
#StopOnError=false
#MaxNoOfConcurrentOperations=100000

[MYSQLD]
Id=4
HostName=89.200.x.x

[MYSQLD]
Id=5
HostName=89.200.x.x

[MYSQLD]
Id=6
HostName=89.200.x.x

To summarize, 140 tables, 300 or so indexes across these.
We use some TEXT columns sparingly.
No BLOBS as far as I know.

We are working through trying to optimize data types for varchar by replacing with more appropriate types where possible.

Thanks

is this bug still active?

The bug was persistent, and subsequently we moved away from MySQL Cluster to a replicated platform.

Looking at traces and outfiles,
i looks like swapping.
this is is only a guess.

i'm closing this as "can't repeat"
sorry for the insufficient support
for you and your application

One extra node: If you still would have been interested,
the first thing to do is to upgrade to a 6.3.x release