Bug #48033 GCP stop under virtually no load, followed by full cluster crash
Submitted: 14 Oct 10:05 Modified: 16 Oct 14:10
Reporter: Daniel Herlitz
Status: Open
Category:Server: Cluster Severity:S2 (Serious)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: Gustaf Thorslund Target Version:
Tags: 7.0.7
Triage: Triaged: D1 (Critical) / R6 (Needs Assessment) / E6 (Needs Assessment)

[14 Oct 10:05] Daniel Herlitz
Description:
After running fine for a week under very low load both NDB nodes suddendly crashed this
morning. 

Node 3 killed this node because GCP stop was detected

Forced node shutdown completed. Caused by error 2303

I will attach all relevant log parts, ndb_error_report and config.ini

How to repeat:
No idea

Suggested fix:
No idea
[14 Oct 10:05] Daniel Herlitz
Logs, config files

Attachment: 2009-10-14.tar.gz (application/x-gzip, text), 174.72 KiB.

[14 Oct 10:37] Daniel Herlitz
I don't know if this is related, but when we try to start NDB, both nodes kind of seem
locked up in weird way, printing about 100 lines / second to the log log files of this
message:

delay: reqs=348
[14 Oct 12:42] Jonas Oreland
swap ?
other processes on machines ?

LockPagesInMemory (requires root) or RealtimeScheduler (requires root)
Can make ndbd more robust in competing for hw-resources
[14 Oct 16:55] Daniel Herlitz
The three machines are dedicated (though virtual) to running the MySQL cluster (no other
services running on them). They are part of a staging environment which noone was using
at the time of the crash (more or less zero load).
[16 Oct 11:18] Gustaf Thorslund
Daniel,

Running cluster on virtual machines isn't a very good idea. But does the virtual machines
have dedicated RAM or do they share them with other virtual machines? In that case you
"RAM" might have ended up on disk anyway. What software do you use for virtualization?

/Gustaf
[16 Oct 14:10] Daniel Herlitz
Those machines have 3 GBs of dedicated memory (no "overallocation"). We run Oracle VM
(Xen) for virtualization.