Bug #48087 ndbd crash with error 2341
Submitted: 15 Oct 2009 14:53 Modified: 30 Dec 2009 20:26
Reporter: raid fifa Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux (SuSE EL SP2 x86_64)
Assigned to: Assigned Account CPU Architecture:Any
Tags: 5.1.35-ndb-7.0.7 commercial

[15 Oct 2009 14:53] raid fifa
Description:
Environment:
4 machines: 
2 IBM x3850m2(4cores*4,8GB mem) as one mgmd and two mysqlds, 2 IBM x3950m2(4cores*8, 32GB
mem) as four ndbd nodes.
OS:
SuSE EnterpriseLinux SP2 for x86_64
MySQL Cluster:
mysql-com-5.1.35-ndb-7.0.7 for Linux x86_64

node2 and node3 are one group, node4 and node5 are another group.
Our mysql cluster system has been built about one month. But it's not reliable. Node2 crashed several times with error 2341, and then node3 crashed with GCP stop, and then the whole mysql cluster stopped; we need to start ndbd with --initial parameter.

Currently, there are 144 tables, about 8GB data. some tables are in-memory, some big tables(> 3 million rows) are disk-base tables. 
Our application will check data( SELECT ... Where ...) and transfer into another table(INSERT INTO ... SELECT ... Where ...), then delete original table data( DELETE from ... WHERE ...) automaticlly.

DataMemory and IndexMemory usage were about 75%-80%.

How to repeat:
It is hard to repeat this at exact time and I don't catch what DML/DDL could trigger this.

Suggested fix:
I can understand error 2341, maybe there are a bug; but why node3 also crashed with GCP stop error, this is very terrible!
[15 Oct 2009 14:56] raid fifa
configfile_tracelog_outlog

Attachment: 20091015.rar (application/octet-stream, text), 66.53 KiB.

[16 Oct 2009 8:48] Gustaf Thorslund
Raid Fifa,

Looks like a duplicate of of bug#37227, bug#41292, bug#39498, and bug#43069.

The full output of ndb_error_reporter would be useful.

See you have LockPagesInMainMemory=0. This could have resulted in swapping, but need more logs to confirm that.

/Gustaf
[19 Oct 2009 1:34] raid fifa
thanks!
which log do you need in order to confirm whether there is kernel swapping?
[19 Oct 2009 12:18] Hartmut Holzgraefe
We're interested in all cluster log, error and trace files, you can collect these
using the ndb_error_reporter utility:

   http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-programs-ndb-error-reporter.html

or manually by checking all management and data nodes data directories

Whether your system was swapping is not really visible from these logs, for this you need to actively monitor your systems behavior using tools like e.g. vmstat
[20 Oct 2009 3:23] raid fifa
I had transfer logs(bug-data-48087.zip) which is from Sep 9 to Oct 15 to your ftp server.
Hope this could help you.
Thanks!
[30 Dec 2009 20:26] Andrew Hutchings
The 2341 crash is a duplicate of bug #48852.  Basically you need to increase SendBufferMemory as this has been exhausted, the fix for bug #48852 simply changes the error message to reflect this.

Closing this bug as a duplicate of bug #48852