Bug #48087 ndbd crash with error 2341
Submitted: 15 Oct 16:53 Modified: 20 Oct 5:23
Reporter: raid fifa
Status: Open
Category:Server: Cluster Severity:S1 (Critical)
Version:mysql-5.1-telco-7.0 OS:Linux (SuSE EL SP2 x86_64)
Assigned to: Gustaf Thorslund Target Version:
Tags: 5.1.35-ndb-7.0.7 commercial
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[15 Oct 16:53] raid fifa
Description:
Environment:
4 machines: 
2 IBM x3850m2(4cores*4,8GB mem) as one mgmd and two mysqlds, 2 IBM x3950m2(4cores*8,
32GB
mem) as four ndbd nodes.
OS:
SuSE EnterpriseLinux SP2 for x86_64
MySQL Cluster:
mysql-com-5.1.35-ndb-7.0.7 for Linux x86_64

node2 and node3 are one group, node4 and node5 are another group.
Our mysql cluster system has been built about one month. But it's not reliable. Node2
crashed several times with error 2341, and then node3 crashed with GCP stop, and then the
whole mysql cluster stopped; we need to start ndbd with --initial parameter.

Currently, there are 144 tables, about 8GB data. some tables are in-memory, some big
tables(> 3 million rows) are disk-base tables. 
Our application will check data( SELECT ... Where ...) and transfer into another
table(INSERT INTO ... SELECT ... Where ...), then delete original table data( DELETE from
... WHERE ...) automaticlly.

DataMemory and IndexMemory usage were about 75%-80%.

How to repeat:
It is hard to repeat this at exact time and I don't catch what DML/DDL could trigger
this.

Suggested fix:
I can understand error 2341, maybe there are a bug; but why node3 also crashed with GCP
stop error, this is very terrible!
[15 Oct 16:56] raid fifa
configfile_tracelog_outlog

Attachment: 20091015.rar (application/octet-stream, text), 66.53 KiB.

[16 Oct 10:48] Gustaf Thorslund
Raid Fifa,

Looks like a duplicate of of bug#37227, bug#41292, bug#39498, and bug#43069.

The full output of ndb_error_reporter would be useful.

See you have LockPagesInMainMemory=0. This could have resulted in swapping, but need more
logs to confirm that.

/Gustaf
[19 Oct 3:34] raid fifa
thanks!
which log do you need in order to confirm whether there is kernel swapping?
[19 Oct 14:18] Hartmut Holzgraefe
We're interested in all cluster log, error and trace files, you can collect these
using the ndb_error_reporter utility:

   http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-programs-ndb-error-reporter.html

or manually by checking all management and data nodes data directories

Whether your system was swapping is not really visible from these logs, for this you need
to actively monitor your systems behavior using tools like e.g. vmstat
[20 Oct 5:23] raid fifa
I had transfer logs(bug-data-48087.zip) which is from Sep 9 to Oct 15 to your ftp server.
Hope this could help you.
Thanks!