Bug #56026 ndb backup will frequently bring cluster down after upgrade to 7.1.5
Submitted: 16 Aug 2010 18:36 Modified: 19 Sep 2010 19:42
Reporter: Troy A Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.1 OS:Linux
Assigned to: CPU Architecture:Any
Tags: Backup, backup.cpp, mysql-cluster-gpl-7.1.5, ndbrequire

[16 Aug 2010 18:36] Troy A
Description:
After upgrading from 7.1.4, we frequently have a data node fail upon issuing a backup command.  Sometimes, I believe this failure happens on multiple hosts, which brings the entire cluster down.

How to repeat:
unknown, it does not happen every time.

Suggested fix:
We are about to try restarting nodes with --initial, failing that as a solution, we will try backing out to 7.1.4.
[16 Aug 2010 18:44] Troy A
error report uploaded anonymously as /pub/mysql/upload/bug-data-56026.zip
[16 Aug 2010 20:58] Andrew Hutchings
Hello Troy,

Your upload seems to be missing the config.ini file itself and the ndb_x_out.log files for the data nodes, can you also please provide these?
[16 Aug 2010 22:50] Troy A
*out.log, config.ini

Attachment: 56026.tar.bz2 (application/x-bzip2, text), 182.64 KiB.

[16 Aug 2010 22:53] Troy A
Sorry about that.  I had blindly removed files older than 30 days from the archive, thinking those would be irrelevant, in order to reduce its size when initially attempting to get it under 500kB.  Let me know if you need anything else.

Thanks

-Troy
[19 Aug 2010 19:16] Troy A
I mysqldumped and restored all databases with NDB tables on Tuesday August 17, and a data node crash has not happened since.  This is by far the longest we've gone on 7.1.5 without a crash.
[19 Aug 2010 19:42] Andrew Hutchings
Hello Troy,

It looks like something is stopping logging on the data nodes, the logs you provided are very old and judging by the content we would expect at least daily messages in them.  Without full and current logs we will not be able to diagnose this.

We are not yet sure on the exact cause of your problem but suspect it is mostly due to some script you are running which is repeatedly trying to make a backup every few seconds and failing because the backup already exists.
[19 Sep 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".