Bug #47939 Backup: MyISAM backup driver can wait in unlock() method which is wrong.
Submitted: 9 Oct 2009 9:25 Modified: 8 Mar 2010 1:18
Reporter: Rafal Somla Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Thava Alagu CPU Architecture:Any
Tags: backup team tree

[9 Oct 2009 9:25] Rafal Somla
Description:
The correct functioning of MySQL backup system depends on the assumption that backup driver's lock() and unlock() calls are instant. No potentially time-consuming operations such as I/O or waiting should happen within these methods.

This requirement is violated by MyISAM native backup driver whose unlock() method calls kill_locking_thread() which performs complex synchronization protocol and can wait on a condition.

How to repeat:
See code in storage/myisam/myisam_backup_engine.cc, methods Backup::unlock() and Backup::kill_locking_thread().

Suggested fix:
Do not call kill_locking_thread() from unlock() method, but after the call. For example in the following get_data() call.
[18 Nov 2009 12:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/90839

2894 Thava Alagu	2009-11-18
      Bug#47939 : MyISAM backup driver can wait in unlock() too long
      
      As per the protocol, the backup driver should return from 
      unlock() as early as possible. However, the killing of  
      locking thread by MyISAM driver from within unlock() can
      take too long time.
      
      This is fixed by returning from unlock() immediately 
      after remembering to kill locking thread later.
      The locking thread is killed from the very next get_data()
      call at the beginning of "final-transfer" phase.
[19 Nov 2009 8:22] Thava Alagu
The suggested fix in the last patch was causing backup_triggers_and_events test to fail because of the following scenario:
   - Thread executing "Backup database ..." proceeds until after the validity
     point and waits at sync point (at "after_backup_binlog" waiting for "Go")
     before killing the locking thread in effect holding the locks on tables.
   - The connection thread con2 is waiting on lock on t1_same. deadlock.
The test case can be fixed by sending another Go signal to main thread, 
however this will not serve the purpose of the test which does updates just after VP but before the final phase. 

One possibility of solving this problem would be to introduce another post_unlock() call (which amounts to extending the current protocol) so that drivers like MyISAM can kill the locking thread here. The test case can then exactly block the kernel just after VP and post_unlock(), then do some updates before the final phase.
[27 Nov 2009 8:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91872

2897 Thava Alagu	2009-11-27
      Bug#47939 : MyISAM backup driver can wait in unlock() too long
      
      As per the protocol, the backup driver should return from 
      unlock() as early as possible. However, the killing of  
      locking thread by MyISAM driver from within unlock() 
      involves handshakes with locking thread which should
      be eliminated to avoid potential problems.
      
      This is fixed by making kill_locking_thread() not to wait
      for the death of the locking thread.
      The wait for the confirmation of locking thread's death
      happens later at the end.
[30 Nov 2009 12:26] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92080

2899 Thava Alagu	2009-11-30
      Bug#47939 : MyISAM backup driver can wait in unlock() too long
      
      As per the protocol, the backup driver should return from 
      unlock() as early as possible. However, the killing of  
      locking thread by MyISAM driver from within unlock() 
      involves handshakes with locking thread which should
      be eliminated to avoid potential problems.
      
      This is fixed by making kill_locking_thread() not to wait
      for the death of the locking thread.
      The wait for the confirmation of locking thread's death
      happens later at the end.
     @ sql/sql_class.cc
        Fixed race condition in kill query logic. The race condition caused backup_bml_block test failure.
[1 Dec 2009 7:02] Thava Alagu
The last committed patch is causing main.truncate_coverage and main.merge tests to fail. This should be due to some race conditions in kill query logic. will commit a new patch after fixing these.
[1 Dec 2009 11:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92246

2899 Thava Alagu	2009-12-01
      Bug#47939 : MyISAM backup driver can wait in unlock() too long
      
      As per the protocol, the backup driver should return from 
      unlock() as early as possible. However, the killing of  
      locking thread by MyISAM driver from within unlock() 
      involves handshakes with locking thread which should
      be eliminated to avoid potential problems.
      
      This is fixed by making kill_locking_thread() not to wait
      for the death of the locking thread.
      The wait for the confirmation of locking thread's death
      happens later at the end.
     @ sql/sql_class.cc
        Fixed race condition in kill query logic. 
        The race condition caused backup_bml_block test failure.
[1 Dec 2009 15:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/92304

2899 Thava Alagu	2009-12-01
      Bug#47939 : MyISAM backup driver can wait in unlock() too long
      
      As per the protocol, the backup driver should return from 
      unlock() as early as possible. However, the killing of  
      locking thread by MyISAM driver from within unlock() 
      involves handshakes with locking thread which should
      be eliminated to avoid potential problems.
      
      This is fixed by making kill_locking_thread() not to wait
      for the death of the locking thread.
      The wait for the confirmation of locking thread's death
      happens later at the end.
     @ sql/sql_class.cc
        Fixed race condition in kill query logic. 
        The race condition caused backup_bml_block test failure.
[2 Dec 2009 10:43] Ingo Strüwing
Approved.
[3 Dec 2009 1:03] Thava Alagu
Patch pushed to mysql-6.0-backup tree.
revision_id: thavamuni.alagu@sun.com-20091202225841-108sqz05zrijyp8g
[4 Jan 2010 16:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95898

3024 Ingo Struewing	2010-01-04
      WL#5101 - MySQL Backup back port - MS09
      Merged revid:thavamuni.alagu@sun.com-20091202225841-108sqz05zrijyp8g
        Bug#47939 : MyISAM backup driver can wait in unlock() too long
        
        As per the protocol, the backup driver should return from 
        unlock() as early as possible. However, the killing of  
        locking thread by MyISAM driver from within unlock() 
        involves handshakes with locking thread which should
        be eliminated to avoid potential problems.
        
        This is fixed by making kill_locking_thread() not to wait
        for the death of the locking thread.
        The wait for the confirmation of locking thread's death
        happens later at the end.
     @ mysql-test/r/truncate_coverage.result
        WL#5101 - MySQL Backup back port - MS09
        Updated test result.
     @ mysql-test/t/truncate_coverage.test
        WL#5101 - MySQL Backup back port - MS09
        Stabilized test case.
     @ storage/myisam/myisam_backup_engine.cc
        WL#5101 - MySQL Backup back port - MS09
        Moved waiting for the locking thread's end to the Backup destructor.
[20 Feb 2010 9:19] Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[8 Mar 2010 1:18] Paul DuBois
Noted in 6.0.14 changelog.

The backup driver waited too long in unlock().