MySQL Bugs: #68853: ABORT BACKUP delays

Bug #68853	ABORT BACKUP delays
Submitted:	3 Apr 2013 9:51	Modified:	26 Nov 2013 12:49
Reporter:	Hartmut Holzgraefe	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	mysql-cluster-7.2.10	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
ABORT BACKUP takes a long time before data node feedback about the backup being aborted comes in, about the same time as a successful backup completion would take, and the backup files written have about the same size, too, regardless of the ABORT ... ndb_restore from such a backup doesn't work though.

So some sort of ABORT has actually happened, but as neither time nor space was saved by this the ABORT action becomes more or less useless.

How to repeat:
I created a two node cluster with 4GB of data memory per node, all running locally on a single 16G machine. I configured BackupDataDir to point to a file system on a SSD card attached via an USB2 reader so that backups do not complete too fast to have enough time to issue an ABORT

Data nodes data memory is about 88% filled. A successful backup takes about 3 minutes and produces a backup dir size of about 2GB. Doing a START BACKUP WAIT STARTED followed by an ABORT BACKUP right away also takes about 3 minutes before a backup aborted error comes back form data nodes, and the size of the new backup dir is also about 2GB. ndb_restore from this backup is not possible so ABORT does actually seem to have kicked in eventually, but only after most of the backup was already done. So ABORT becomes more or less useless as one still has to wait for about the same time a successful backup would have taken ... file size can be seen growing slowly throughout the time the backup takes, so this is not just about waiting for SSD file system flush .,..

Suggested fix:
Respond to an ABORT request quickly and not just near the end of the backup operation

Hello Hartmut,

Thank you for the report.
Verified as described.

Thanks,
Umesh

Also, noticed that the aborted backup files were not removed.
Per doc - "The Backup backup_id started from node management_node_id has been aborted messages mean that the backup has been terminated and that all files relating to this backup have been removed from the cluster file system". 

more http://dev.mysql.com/doc/mysql-cluster-excerpt/5.5/en/mysql-cluster-backup-using-managemen...

ndb_mgm> START BACKUP 100 WAIT STARTED
Waiting for started, this may take several minutes
ABORT BACKUP 10Node 4: Backup 100 started from node 1
ndb_mgm> ABORT BACKUP 100
Abort of backup 100 ordered
ndb_mgm> Node 4: Backup 100 started from 1 has been aborted. Error: 1321

ndb_mgm>

[ushastry@cluster-repo mysql-cluster-com-7_2_10]$ ls -l cluster-data/BACKUP/BACKUP-100/
total 81332
-rw-rw-r-- 1 ushastry ushastry 41563360 Apr 13 15:37 BACKUP-100-0.4.Data
-rw-rw-r-- 1 ushastry ushastry 41640464 Apr 13 15:37 BACKUP-100-0.5.Data
-rw-rw-r-- 1 ushastry ushastry    27440 Apr 13 15:37 BACKUP-100.4.ctl
-rw-rw-r-- 1 ushastry ushastry       48 Apr 13 15:37 BACKUP-100.4.log
-rw-rw-r-- 1 ushastry ushastry    27440 Apr 13 15:37 BACKUP-100.5.ctl
-rw-rw-r-- 1 ushastry ushastry       48 Apr 13 15:37 BACKUP-100.5.log
[ushastry@cluster-repo mysql-cluster-com-7_2_10]$

Thank you for the bug report.

Fixed in NDB 7.0+, documented in the NDB 7.0.40, 7.1.29, 7.2.14, and 7.3.3 changelogs, as follows:

        ABORT BACKUP in the ndb_mgm client took an excessive amount of
        time to return (approximately as long as the backup would have
        taken, had it not been aborted), and failed to remove files
        generated by the aborted backup.

Closed.