Bug #15425 Small window for NF during backup failing without error
Submitted: 2 Dec 2005 8:26 Modified: 9 Dec 2005 19:53
Reporter: Stewart Smith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1.17, 5.0.17, 5.1 OS:
Assigned to: Stewart Smith CPU Architecture:Any

[2 Dec 2005 8:26] Stewart Smith
Description:
10018 is crash in FSAPPENDCONF. i.e. the file write didn't succeed.

If crash 10018 is inserted on a 2 node cluster with fast CPU and slower disk,
all nodes can respond with BACKUP_FRAGMENT_CONF for all fragments before the
error in FSAPPENDCONF is hit.

This would mean that no error code was set for the backup and that it would
be incomplete as not all IO had been written to disk before the node crash.
This would not be reported to the user.

So the backup would appear to succeed but it really didn't.

The window for this is rather small though.

How to repeat:
testBackup -n NFSlave

Reproduced on my laptop (2.13ghz pentium-m). 2 node, 2 replicas.

Suggested fix:
display backup aborted due to node failure error.
[2 Dec 2005 8:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/32949
[5 Dec 2005 13:17] Stewart Smith
pushed to 4.1.17 and 5.0.17
[9 Dec 2005 19:53] Paul Dubois
Noted in 4.1.17, 5.0.17 changelog.
[18 Jan 2006 8:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/33355