Bug #75607 node crash during backup
Submitted: 23 Jan 2015 18:33 Modified: 12 Oct 2015 10:31
Reporter: Jonathan Lowsley Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1.72 ndb-7.1.29 OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any

[23 Jan 2015 18:33] Jonathan Lowsley
Description:
roughly 2% of the time during a ndb backup (invoked with /usr/bin/ndb_mgm -e "START BACKUP") around 10 minutes into the backup I get a node shutdown.

I think this is a regression introduced somewhere between 7.1.17 and 7.1.28.  I ran 7.1.17 for years without the error, and then I started getting it in 7.1.28 and now in 7.1.29

Only one out of 4 ndbd nodes crashes, and it's not the same one every time.

Time: Friday 23 January 2015 - 07:22:35
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 718) 0x00000002
Program: ndbd
Pid: 12169
Version: mysql-5.1.72 ndb-7.1.29
Trace: /var/lib/mysql-cluster/ndb_11_trace.log.15 [t1..t1]

How to repeat:
run backups every 4 hours for a few weeks
[23 Jan 2015 18:34] Jonathan Lowsley
ndb_error_report

Attachment: ndb_error_report_20150123112110.tar.bz2 (application/x-bzip, text), 2.79 MiB.

[11 Feb 2015 16:47] Jonathan Lowsley
another node crash due to this issue:

2015-02-10 19:15:42 [ndbd] INFO     -- backup/Backup.cpp
2015-02-10 19:15:42 [ndbd] INFO     -- BACKUP (Line: 718) 0x00000002
2015-02-10 19:15:42 [ndbd] INFO     -- Error handler shutting down system
2015-02-10 19:15:42 [ndbd] INFO     -- Error handler shutdown completed - exiting
2015-02-10 19:15:45 [ndbd] ALERT    -- Node 13: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

Time: Tuesday 10 February 2015 - 19:15:42
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: Backup.cpp
Error object: BACKUP (Line: 718) 0x00000002
Program: ndbd
Pid: 2227
Version: mysql-5.1.72 ndb-7.1.29
Trace: /var/lib/mysql-cluster/ndb_13_trace.log.17 [t1..t1]
***EOM***
[12 Oct 2015 7:09] MySQL Verification Team
Hi,

it's crashing on
ndbrequire(c_backupPool.getSize() == c_backupPool.getNoOfFree() + 1);

I can't, trough logs, see why we have a problem here but the only reason I can think of is the lack of RAM on that node. I can't make it crash on my test setup even if I play with ram a lot .. 

There are few updates on the backup and ndb_restore in the upper 7.1 tree so I might suggest you upgrade to latest 7.1. There's nothing that will explicitly fix this situation you have but will certainly fix number of other things. Would be helpful if you can reproduce the problem with latest 7.1 too. 

all best
Bogdan Kecman
[12 Oct 2015 10:13] MySQL Verification Team
This bug is fixed in 

5.6.25-ndb-7.4.8
5.6.25-ndb-7.3.11
5.5.44-ndb-7.2.22

Unfortunately no fix is (nor will be) available for 7.1 tree.

best regards
Bogdan Kecman