MySQL Bugs: #48087: ndbd crash with error 2341

Bug #48087	ndbd crash with error 2341
Submitted:	15 Oct 2009 14:53	Modified:	30 Dec 2009 20:26
Reporter:	raid fifa	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1-telco-7.0	OS:	Linux (SuSE EL SP2 x86_64)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	5.1.35-ndb-7.0.7 commercial

Description:
Environment:
4 machines: 
2 IBM x3850m2(4cores*4,8GB mem) as one mgmd and two mysqlds, 2 IBM x3950m2(4cores*8, 32GB
mem) as four ndbd nodes.
OS:
SuSE EnterpriseLinux SP2 for x86_64
MySQL Cluster:
mysql-com-5.1.35-ndb-7.0.7 for Linux x86_64

node2 and node3 are one group, node4 and node5 are another group.
Our mysql cluster system has been built about one month. But it's not reliable. Node2 crashed several times with error 2341, and then node3 crashed with GCP stop, and then the whole mysql cluster stopped; we need to start ndbd with --initial parameter.

Currently, there are 144 tables, about 8GB data. some tables are in-memory, some big tables(> 3 million rows) are disk-base tables. 
Our application will check data( SELECT ... Where ...) and transfer into another table(INSERT INTO ... SELECT ... Where ...), then delete original table data( DELETE from ... WHERE ...) automaticlly.

DataMemory and IndexMemory usage were about 75%-80%.

How to repeat:
It is hard to repeat this at exact time and I don't catch what DML/DDL could trigger this.

Suggested fix:
I can understand error 2341, maybe there are a bug; but why node3 also crashed with GCP stop error, this is very terrible!

configfile_tracelog_outlog

Attachment: 20091015.rar (application/octet-stream, text), 66.53 KiB.

Raid Fifa,

Looks like a duplicate of of bug#37227, bug#41292, bug#39498, and bug#43069.

The full output of ndb_error_reporter would be useful.

See you have LockPagesInMainMemory=0. This could have resulted in swapping, but need more logs to confirm that.

/Gustaf

thanks!
which log do you need in order to confirm whether there is kernel swapping?

We're interested in all cluster log, error and trace files, you can collect these
using the ndb_error_reporter utility:

   http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-programs-ndb-error-reporter.html

or manually by checking all management and data nodes data directories

Whether your system was swapping is not really visible from these logs, for this you need to actively monitor your systems behavior using tools like e.g. vmstat

I had transfer logs(bug-data-48087.zip) which is from Sep 9 to Oct 15 to your ftp server.
Hope this could help you.
Thanks!

The 2341 crash is a duplicate of bug #48852.  Basically you need to increase SendBufferMemory as this has been exhausted, the fix for bug #48852 simply changes the error message to reflect this.

Closing this bug as a duplicate of bug #48852