Bug #64967 Under load conditions, all the four NDB Cluster DATA NODES went down
Submitted: 13 Apr 2012 8:23 Modified: 1 Jul 2012 19:29
Reporter: Swarup Sengupta Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1.56 ndb-7.1.18 OS:Linux
Assigned to: CPU Architecture:Any

[13 Apr 2012 8:23] Swarup Sengupta
Description:
PROBLEM : 
We have a MySQL cluster solution (mysql-5.1.56 ndb-7.1.18) consisting of 2 Servers (24 Core, 2.93 Ghz,  64 GB of RAM, 1Gb/s eth) with 4 Data Nodes  and 2 management server Nodes. The network Bandwidth is 1Gb/s.
 
Under load conditions, all the four NDB Cluster DATA NODES went down. The IP traffic for the load condition was at 120 Mb/s for both servers.

Error Summary

[MgmtServer (Node 1 and 2)] : Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'

[Node 3 - Nodegroup - 0]Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other

[Node 4 - Nodegroup - 1]Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'.

[Node 5 - Nodegroup - 0]Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

[Node 6 - Nodegroup - 1]Forced node shutdown completed. Caused by error 2315: 'Node declared dead. See error log for details(Arbitration error). Temporary error, restart node'.

Please check the error and provide us with the insight of why this happened and how the same can be prevented.

How to repeat:
The error occured in aforesaid load condition
[13 Apr 2012 8:23] Swarup Sengupta
Management Server 1 log

Attachment: ndb_1_cluster.log (application/octet-stream, text), 28.52 KiB.

[13 Apr 2012 8:24] Swarup Sengupta
Mgmt server 1 out file

Attachment: ndb_1_out.log (application/octet-stream, text), 1.17 KiB.

[13 Apr 2012 8:27] Swarup Sengupta
Management Server 2 log

Attachment: ndb_2_cluster.log (application/octet-stream, text), 52.14 KiB.

[13 Apr 2012 8:27] Swarup Sengupta
Mgmt server 2 out file

Attachment: ndb_2_out.log (application/octet-stream, text), 884 bytes.

[13 Apr 2012 8:28] Swarup Sengupta
conf.ini

Attachment: config.ini (application/octet-stream, text), 3.13 KiB.

[13 Apr 2012 8:31] Swarup Sengupta
Logs for Data Nodes

Attachment: log nodewise.txt (text/plain), 4.93 KiB.

[1 Jun 2012 19:29] Sveta Smirnova
Thank you for the report.

Please send us all trace and error logs from all nodes: archive which creates ndb_error_reporter

Please also check if you don't have OS limits for network connections and RAM usage for every node.
[2 Jul 2012 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".