Bug #47940 Backup: Implementation of save_vp_info() violates its specification.
Submitted: 9 Oct 2009 9:33 Modified: 8 Mar 2010 14:56
Reporter: Rafal Somla Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Rafal Somla CPU Architecture:Any
Tags: backup team tree.

[9 Oct 2009 9:33] Rafal Somla
Description:
Documentation of function save_vp_info() (sql/backup/data_backup.cc:458) states that:

"
  @note This function is called in a time critical synchronization phase
  of the backup process. Therefore it should not perform any time consuming
  or potentially waiting operations such as I/O.
"

This requirement is violated by the implementation, which calls report_error() method which potentially does I/O when writing to backup logs and other destinations.

It should be also checked if calling methods of mysql_bin_log is potentially time-consuming and if yes, if they can be avoided.

How to repeat:
Code inspection.

Suggested fix:
In case of error, do not report it from save_vp_info(). Instead, save information about the error and report it later, when the critical phase is over.
[15 Oct 2009 21:13] Rafal Somla
I think the specification should not be changed. It is very important for the correct functioning of the synchronization protocol that no time consuming operations are performed in the critical phase. This is what the specification requires.

Of course, if satisfying the specification is not possible, then it has to be changed or the whole system (which depends on it) redesigned. But I think it is not that bad. I'm reasonably sure we can find a work-around and ensure that specification is satisfied.
[28 Oct 2009 10:33] Rafal Somla
I see another, even more serious problem with this fragment of code. In data_backup.cc, around line 751 we have:

    if (sch.lock())    // Logs errors.
      goto error;

    if (save_vp_info(info))
      goto error;

    if (sch.unlock() || log.report_killed())    // Logs errors.
      goto error;

Thus if save_vp_info() signals error, then we do goto jumping out of the critical section. This is bad because sch.lock() call should *always* be matched with sch.unlock(). Otherwise we can leave the server in a state where it is completely blocked.

Instead of jumping to error label, we should remember that save_vp_info() signaled error and react to this after sch.unlock() call.
[25 Nov 2009 10:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91566

2895 Rafal Somla	2009-11-25
      Bug#47940 - Backup: Implementation of save_vp_info() violates its specification.
      
      This patch fixes problems in the backup synchronization logic:
      
      - In case of error, all drivers which have ben locked, should be 
        unlocked before we abort operation.
      
      - Potentially time consuming error reporting should not be done inside
        the synchronization phase, but only after all drivers have been
        unlocked.
     @ mysql-test/suite/backup/r/backup_errors_debug_3.result
        This test case triggers errors during unlock() of backup drivers. Since two
        drivers are used, the unlock error should be reported twice. This patch
        fixes it.
     @ sql/backup/data_backup.cc
        - Add new LOCKED driver state.
        - Add Scheduler::report_vp_errors() method.
        - Add Scheduler members m_failed_lock and m_vp_info_error which
          store information about errors encountered in the synchronization
          phase.
        - Remove error reporting from time critical methods.
        - Refactoring of the synchronization phase.
        - Add new operator and fix comments in Pump_iterator class.
        - Refactor Pump_iterator::lock()/unlock() methods: make sure all 
          drivers which have been locked are unlocked in case of error.
        - Refactor Backup_pump methods: remove error reporting and 
          handle the new LOCKED state.
[1 Dec 2009 16:58] Chuck Bell
Approved pending changes
[2 Dec 2009 10:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92426

2901 Rafal Somla	2009-12-02
      Bug#47940 - Backup: Implementation of save_vp_info() violates its specification.
      
      This patch fixes problems in the backup synchronization logic:
      
      - In case of error, all drivers which have been locked, should be 
        unlocked before we abort operation.
      
      - Potentially time consuming error reporting should not be done inside
        the synchronization phase, but only after all drivers have been
        unlocked.
     @ mysql-test/suite/backup/r/backup_errors_debug_3.result
        This test case triggers errors during unlock() of backup drivers. Since two
        drivers are used, the unlock error should be reported twice. This patch
        fixes it.
     @ sql/backup/data_backup.cc
        - Add new LOCKED driver state.
        - Add Scheduler::report_vp_errors() method.
        - Add Scheduler members m_failed_lock and m_vp_info_error which
          store information about errors encountered in the synchronization
          phase.
        - Remove error reporting from time critical methods.
        - Refactoring of the synchronization phase.
        - Add new operator and fix comments in Pump_iterator class.
        - Refactor Pump_iterator::lock()/unlock() methods: make sure all 
          drivers which have been locked are unlocked in case of error.
        - Refactor Backup_pump methods: remove error reporting and 
          handle the new LOCKED state.
[2 Dec 2009 10:51] Rafal Somla
Pushed to mysql-6.0-backup tree.
revid:rafal.somla@sun.com-20091202105014-u8ei828qwpdkjs03
[20 Feb 2010 9:17] Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[8 Mar 2010 14:56] Paul DuBois
No user-visible changes. No changelog entry needed.