Bug #49202 | Data loss and bogus errors when restoring from an incomplete BACKUP file | ||
---|---|---|---|
Submitted: | 30 Nov 2009 11:08 | Modified: | 7 Jan 2010 1:03 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Backup | Severity: | S1 (Critical) |
Version: | 6.0-backup | OS: | Any |
Assigned to: | Paul DuBois | CPU Architecture: | Any |
[30 Nov 2009 11:08]
Philip Stoev
[30 Nov 2009 11:10]
Philip Stoev
backup file to restore from http://mysql-systemqa.s3.amazonaws.com/bug49202.backup.zip
[30 Nov 2009 11:13]
Philip Stoev
Also note that the error log says: Got error 176 when reading from logfile 091130 14:06:39 [ERROR] Restore: Can't shut down MyISAM restore driver(s) 091130 14:06:39 [Warning] Restore: Operation aborted - data might be corrupted In my humble opinion "data might be corrupted" is not a warning that should be dumped in the error log, it must be an error message sent straight to the user.
[1 Dec 2009 17:05]
Hema Sridharan
Part of this bug report is similar to BUG#36931 (Data integrity verification of Backup file not possible). MySQL Backup feature leaves the backup operation in an incredible state as RESTORE is destructive operation. I executed a test where a similar issue is seen as reported in this bug. Restore fails because of full disk and the error message that I got was, ERROR 1699 (HY000): Error when reading summary section of backup image There are 2 things to be noted here: 1. Restore failed and eventually all the database contents are lost from the server 2. The error message indicated is not self sufficient to understand the issue on why restore is failing It is essential that restore fails by providing appropriate error messages to user.
[2 Dec 2009 9:36]
Philip Stoev
Here is what is in the mysql.backup_history table shows for a killed backup: mysql> select * from backup_history\G *************************** 1. row *************************** backup_id: 276 process_id: 0 binlog_start_pos: 0 binlog_file: backup_state: error operation: backup error_num: 0 num_objects: 2 total_bytes: 3615 validity_point_time: 0000-00-00 00:00:00 start_time: 2009-12-02 09:21:56 stop_time: 2009-12-02 09:22:30 host_or_server_name: localhost username: root backup_file: backup backup_file_path: /tmp/ user_comment: command: backup database test to '/tmp/backup' drivers: MyISAM 1 row in set (0.00 sec) Even though the backup_state is "error", the error_num is zero, which is misleading.
[10 Dec 2009 14:31]
Rafal Somla
Thinking about the issue of an interrupted BACKUP leaving "orphan" backup files (which should not happen), here is one hypothesis (rather far-fetched). The code which removes unfinished backup images is present in Backup_restore_ctx::close() method (kernel.cc:1307): if (!m_completed && m_state == PREPARED_FOR_BACKUP) { int ret= m_stream->remove(); // Reports errors. if (ret != BSTREAM_OK) fatal_error(ER_CANT_DELETE_FILE); } else { int ret= m_stream->close(); // Reports errors. if (ret != BSTREAM_OK) fatal_error(ER_BACKUP_CLOSE); } m_stream->remove() should remove the file. Member m_state is set to PREPARED_FOR_BACKUP in Backup_restore_ctx::prepare_for_backup(), just after the output stream is opened. Member m_completed is FALSE until explicitly set to TRUE at the end of Backup_restore_ctx::do_backup(), when complete image has been written. Thus the only possibility of leaving unfinished file on disk which I can see is that m_stream->close() fails but the file stays on disk. This could be fixed with the following change of the above fragment: if (!m_completed && m_state == PREPARED_FOR_BACKUP) { int ret= m_stream->remove(); // Reports errors. if (ret != BSTREAM_OK) fatal_error(ER_CANT_DELETE_FILE); } else { int ret= m_stream->close(); // Reports errors. if (ret != BSTREAM_OK) { fatal_error(ER_BACKUP_CLOSE); m_stream->remove(); // Ignore errors from remove(). } } Note: perhaps Stream::remove() must be updated so that it can be called on a stream which is in error state. Right now, I don't know how to verify if this change fixes anything...
[16 Dec 2009 11:06]
Rafal Somla
Note: Issue B has been already reported as BUG#34767. In the discussion it was decided that it should be fixed in a generic way. Related WLs are WL#4385 and WL#5167.
[18 Dec 2009 12:01]
Sanjay Manwani
Per Philip, he also cannot repeat this bug now. But there is still a rare possibility (e.g. if someone pulls the plug on the h/w) for there to be an incomplete backup file. After some discussion, documentation team proposed the limitations: For BACKUP DATABASE: If the operation fails, it returns an error. Any file created by the operation normally is removed. It is possible in rare cases that the incomplete image file will not be removed, in which case it should be removed manually. Using such an image file for RESTORE may render recovered databases unusable. For RECOVER/RESTORE: Be sure that the image file was created from a successful BACKUP DATABASE operation and has not been tampered with or modified. A RESTORE using a compromised image file may render recovered databases unusable.
[5 Jan 2010 7:24]
Sanjay Manwani
Changing status to documenting. Since the document change is requested - per previous comment.
[7 Jan 2010 1:03]
Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.