Bug #32955 Node repeatedly crashes with error 6050
Submitted: 4 Dec 2007 12:10 Modified: 13 Mar 2009 9:54
Reporter: James Graham Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0.45 OS:Linux (Debian etch)
Assigned to: Jonas Oreland CPU Architecture:Any

[4 Dec 2007 12:10] James Graham
Description:
Hi

Since starting node 2 with --initial, the node will shortly crash. Once restarted, approximately an hour later (including 25-35 minutes to start) the node will again crash with the same error (detailed below).

Node 2: Forced node shutdown completed, restarting. Caused by error 6050: 'WatchDog terminate, internal error or massive overload on the machine running this node(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. - Unknown error code: Unknown result: Unknown error code

ndb show:

[ndbd(NDB)]	2 node(s)
id=2	@89.200.x.x  (Version: 5.0.45, Nodegroup: 0)
id=3	@89.200.x.x  (Version: 5.0.45, Nodegroup: 0, Master)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@89.200.x.x  (Version: 5.0.45)

[mysqld(API)]	3 node(s)
id=4	@89.200.x.x  (Version: 5.0.45)
id=5	@89.200.x.x  (Version: 5.0.45)
id=6	@89.200.x.x  (Version: 5.0.45)

We have checked the hard disk for integrity. We have checked the ram for ECC errors. Both are fine. The node will crash regardless of traffic levels.

Each server is specced with 4GB ram, dual 2.13ghz xeons.
Load averages: 3.39 (1 min) 3.55 (5 mins) 3.04 (15 mins)
RAM utilization is about 50% across the board.

Any help appreciated. Thanks

How to repeat:
Start the ndbd, wait an hour or so...

Suggested fix:
No idea
[4 Dec 2007 12:19] Hartmut Holzgraefe
To look into this any further we need to know:

- CPU usage of the ndbd process when it runs into this
- the cluster configuration file
- the general cluster log and the error log and trace logs from this node
- information on the amount of objects (tables, indexes, ...) and data in the cluster and whether TEXT or BLOB columns are used would help, too
[4 Dec 2007 12:36] James Graham
ndbd hovers around 60% but can range from 40% - 80%.

config.ini:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=2048MB
IndexMemory=512MB
MaxNoOfAttributes=10000
MaxNoOfConcurrentOperations=131072
MaxNoOfOrderedIndexes=1024
MaxNoOfTables=256
TimeBetweenLocalCheckpoints=12
NoOfFragmentLogFiles=64
#MaxNoOfUniqueHashIndexes=768
TransactionDeadlockDetectionTimeout=12000
MaxNoOfConcurrentTransactions=20000

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]

[TCP DEFAULT]

[NDB_MGMD]
Id=1                            # the management server (this one)
HostName=89.200.x.x          

[NDBD]
Id=2                            # the first storage node
HostName=89.200.x.x
DataDir= /var/lib/mysql-cluster
StopOnError=false
MaxNoOfConcurrentOperations=100000

[NDBD]
Id=3                            # the second storage node
HostName=89.200.x.x
DataDir=/var/lib/mysql-cluster
StopOnError=false
MaxNoOfConcurrentOperations=100000

#[NDBD]
#Id=7                            # the third storage node 
#HostName=89.200.x.x
#DataDir=/var/lib/mysql-cluster
#StopOnError=false
#MaxNoOfConcurrentOperations=100000

[MYSQLD]
Id=4
HostName=89.200.x.x

[MYSQLD]
Id=5
HostName=89.200.x.x

[MYSQLD]
Id=6
HostName=89.200.x.x
[4 Dec 2007 13:04] James Graham
To summarize, 140 tables, 300 or so indexes across these.
We use some TEXT columns sparingly.
No BLOBS as far as I know.

We are working through trying to optimize data types for varchar by replacing with more appropriate types where possible.

Thanks
[13 Mar 2009 8:42] Jonas Oreland
is this bug still active?
[13 Mar 2009 9:17] James Graham
The bug was persistent, and subsequently we moved away from MySQL Cluster to a replicated platform.
[13 Mar 2009 9:54] Jonas Oreland
Looking at traces and outfiles,
i looks like swapping.
this is is only a guess.

i'm closing this as "can't repeat"
sorry for the insufficient support
for you and your application
[13 Mar 2009 9:55] Jonas Oreland
One extra node: If you still would have been interested,
the first thing to do is to upgrade to a 6.3.x release