MySQL Bugs: #48930: Backup: Lock thread race condition for default backup driver

Bug #48930	Backup: Lock thread race condition for default backup driver
Submitted:	20 Nov 2009 8:10	Modified:	7 Mar 2010 19:32
Reporter:	Jørgen Løland	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S2 (Serious)
Version:		OS:	Any
Assigned to:	Rafal Somla	CPU Architecture:	Any

Description:
There is a race condition between the backup thread and locking thread for the default backup driver in be_default.cc.

The symptoms are these:
* The backup_interruption test for one of the storage engines using the default 
  driver hangs
* The backup thread hangs in wait_for_locking_thread_to_die() [1]
* The locking thread hangs in backup_thread_for_locking() [2]

[1]
#1  0xb7ec70e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08673452 in safe_cond_wait (cond=0x93044b0, mp=0x9304450, 
    file=0x88723ed "be_thread.cc", line=353) at thr_mutex.c:423
#3  0x08729719 in Locking_thread_st::wait_until_locking_thread_dies (
    this=0x93043c0) at be_thread.cc:353
#4  0x087279d8 in ~Backup (this=0x9304e20) at be_default.cc:159
#5  0x08728212 in default_backup::Backup::free (this=0x9304e20)
    at ../../sql/backup/be_default.h:81
#6  0x08723566 in ~Backup_pump (this=0x9304db0) at data_backup.cc:1151

[2]
#1  0xb7ec70e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08673452 in safe_cond_wait (cond=0x9304420, mp=0x93043c0, 
    file=0x88723ed "be_thread.cc", line=197) at thr_mutex.c:423
#3  0x0872a5f0 in backup_thread_for_locking (arg=0x93043c0) at be_thread.cc:197
#4  0xb7ec34ff in start_thread (arg=0xb716c790) at pthread_create.c:297

Description of the race condition:
 - The backup thread calls kill_locking_thread(). It sets 
   lock_state= LOCK_SIGNAL protected by the THR_LOCK_caller mutex.
 - The locking thread gets the lock and around line 189-197 it sets 
   lock_state= LOCK_ACQUIRED and starts waiting. It does this protected by 
   the THR_LOCK_thread mutex.

Because lock_state is updated by two different threads holding *different* mutexes, this order of operations is possible:

Backup thread: lock_state= LOCK_SIGNAL  (line 317)
Lock thread:   lock_state= LOCK_ACQUIRED  (line 191)
Lock thread:   lock_state!=LOCK_SIGNAL? Yes, start waiting (line 195-196)

This problem was first reported by Chuck Bell using a MacBook Pro

How to repeat:
Apply patch for BUG#44568 to mysql-6.0-backup branch and run the backup_extra.backup_interruption test for one of the storage engines that uses the default driver, e.g. memory. The problem is intermittent, and happens about 50% of the time on my box (Ubuntu).

Suggested fix:
Protect all updates of lock_state with the same mutex.

Typo: References to be_default.cc above should be be_thread.cc

After analysis I think that this deadlock sequence is not possible:

 Backup thread: lock_state= LOCK_SIGNAL  (line 317)
 Lock thread:   lock_state= LOCK_ACQUIRED  (line 191)
 Lock thread:   lock_state!=LOCK_SIGNAL? Yes, start waiting (line 195-196)

This is because backup thread would have to call kill_locking_thread() before this thread set lock_state to LOCK_ACQUIRED. But in the prepare phase (after prelock() call) the driver is waiting for lock_state to become LOCK_ACQUIRED (be_default.cc:354). Only after it sees LOCK_ACQUIRED, it will kill locking thread setting lock_state to LOCK_SIGNAL.

Need to investigate more.

DEADLOCK SCENARIO
-----------------
1. During table data backup phase, default driver spawns table locking thread (drv->prelock() method).

2. After that, backup thread gets interrupted. Default driver is shut down. It kills locking thread and waits for this thread to be done (function Locking_thread_st::wait_until_locking_thread_dies(), loop around lines 362-363 in be_thread.cc).

3. Locking thread is interrupted around line 145 (be_thread.cc). It detects interruption, sets locking_state to LOCK_ERROR and jumps to the terminating sequence.

4. Because lock_state is LOCK_ERROR, it is not set to LOCK_DONE before thread terminates:

  pthread_mutex_lock(&locking_thd->THR_LOCK_caller);
  net_end(&thd->net);
  delete thd;
  locking_thd->lock_thd= NULL;
  if (locking_thd->lock_state != LOCK_ERROR)
    locking_thd->lock_state= LOCK_DONE;

5. Default driver hangs, because locking thread has terminated without setting lock_state to LOCK_DONE.

REFINED PROBLEM DESCRIPTION
---------------------------
There are two problems which could be detected. The first one, manifested in the deadlock sceanrio above, is that the locking thread of the default backup driver can terminate with lock_state equal to LOCK_ERROR. But function Locking_thread_st::wait_until_locking_thread_dies() assumes that a terminating locking thread should always set lock_state to LOCK_DONE.

Another problem in the design of the synchronization for the locking thread is that Locking_thread_st::lock_state is protected by two different mutexes.  This is wrong, because one thread can take mutex A and modify lock_state while other thread is holding mutex B. However, I could not demonstrate that it can lead to a deadlock in the current code.

PROPOSED SOLUTION
-----------------
1. Have a new Locking_thread_st::m_error member to signal errors, independently from Locking_thread_st::lock_state. The latter will be always set to LOCK_DONE upon locking thread termination (either normal (if m_error == 0) or because of an error (if m_error != 0)).

2. Also, do not use LOCK_SIGNAL value of lock_state, to wake up the locking thread. Instead, use condition and explicit kill of the locking thread. The locking thread will know that it was killed by examining thd->killed flag.

3. Use single mutex for synchronization between the locking thread and the main thread. There will be two synchronization points protected with the mutex. First one after the locking thread has obtained locks and is waiting to be killed, and another one when the locking thread is terminating and about to delete the THD instance. Second synchronization point is needed so that the main thread knows if it can use the THD instance to kill the locking thread.

Note: the design for the final synchronization sequence is borrowed from similar code for the locking thread in the MyISAM backup driver.

Re: [20 Nov 15:57] Rafal Somla

Rafal,

AFAICT the backup thread will wait for
LOCK_ACQUIRED (be_default.cc:354) when the backup thread reaches
the first call to pump(), but not before. However, I suspect that
the backup thread is killed before it reaches this. The test that
gets a hang for me is this:

-------------------
SET DEBUG_SYNC='before_backup_data_prepare SIGNAL here WAIT_FOR go';
SET DEBUG_SYNC='backup_restore_ctx_dtor SIGNAL here';
#
# Start the command.
#
BACKUP DATABASE bup_intr TO 'bup_intr.bkp';
#
# Wait for the command to reach its synchronization point,
# then kill it.
#
SET DEBUG_SYNC='now WAIT_FOR here';
KILL QUERY <query id>;
SET DEBUG_SYNC='now SIGNAL go';
#
# Reap the command and show results.
#
<and we got a hang>
-------------------

And the code that tries to kill the locking thread is line 1058,
which in retrospect has a rather funny comment :-)

1056 if (it->prepare() || m_log.report_killed())  // Logs errors.
1057 {
1058   remove_pump(it);  // Never errors.
1059   return ERROR;
1060 }

You can see all this in the traces below. Notice the value of
lock_state as well:

(gdb) thread 5
[Switching to thread 5 (Thread 0xb72e4790 (LWP 31290))]#0  0xb7f09430 in __ker
l_vsyscall ()
(gdb) bt
#0  0xb7f09430 in __kernel_vsyscall ()
#1  0xb7edc0e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x086732ae in safe_cond_wait (cond=0x999d450, mp=0x999d3f0, 
    file=0x887220d "be_thread.cc", line=351) at thr_mutex.c:423
#3  0x08729579 in Locking_thread_st::wait_until_locking_thread_dies (
    this=0x999d360) at be_thread.cc:351
#4  0x08727838 in ~Backup (this=0x999ddc0) at be_default.cc:159
#5  0x08728072 in default_backup::Backup::free (this=0x999ddc0)
    at ../../sql/backup/be_default.h:81
#6  0x087233c6 in ~Backup_pump (this=0xa00ec70) at data_backup.cc:1151
#7  0x087254ff in ~Pump (this=0xa00ec70) at data_backup.cc:311
#8  0x087237af in backup::Scheduler::remove_pump (this=0xb72cd3dc, 
    p=@0xb72cd38c) at data_backup.cc:1020
#9  0x08723b5b in backup::Scheduler::prepare (this=0xb72cd3dc)
    at data_backup.cc:1058
#10 0x08724d13 in backup::write_table_data (thd=0x9f65f38, info=@0xa0377a0, 
    s=@0xa00d050) at data_backup.cc:662

(gdb) f 3
#3  0x08729579 in Locking_thread_st::wait_until_locking_thread_dies (
    this=0x999d360) at be_thread.cc:351

(gdb) p lock_state
$2 = LOCK_ACQUIRED

(gdb) thread 7
[Switching to thread 7 (Thread 0xb7181790 (LWP 31292))]#0  0xb7f09430 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f09430 in __kernel_vsyscall ()
#1  0xb7edc0e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x086732ae in safe_cond_wait (cond=0x999d3c0, mp=0x999d360, 
    file=0x887220d "be_thread.cc", line=197) at thr_mutex.c:423
#3  0x0872a40e in backup_thread_for_locking (arg=0x999d360) at be_thread.cc:197
#4  0xb7ed84ff in start_thread (arg=0xb7181790) at pthread_create.c:297
#5  0xb7cde49e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

(gdb) f 3
#3  0x0872a40e in backup_thread_for_locking (arg=0x999d360) at be_thread.cc:197
197                           &locking_thd->THR_LOCK_thread);

(gdb) p locking_thd->lock_state
$1 = LOCK_ACQUIRED

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91547

2895 Rafal Somla	2009-11-25
      Bug#48930 - Backup: Lock thread race condition for default backup driver.
      
      This patch fixes few issues in the synchrnization between the table 
      locking thread used by the default backup driver and the main thread. It 
      basically implements a new, hopefully simpler and more correct synchronization 
      protocol which mimics the one used in the MyISAM backup driver. In particular, 
      it uses single mutex for synchronization between the two threads.
     @ sql/backup/be_default.cc
        Change the logic for acquiring table locks, because now errros are not
        signalled through locking_thd->lock_state.
     @ sql/backup/be_snapshot.cc
        Do not use lock_state to infer if all tables have been processed.
        Instead, use explicit information given by get_data() method.
     @ sql/backup/be_thread.cc
        - Changes in the locking thread: 
         -- use m_error for signalling errors, independent from lock_state;
         -- implement new synchronization protocol;
         -- add comments.
        - Changes in kill_locking_thread():
         -- make this method wait for the locking thread after it has been killed,
         -- implement new synchronization protocol;
         -- add extensive comments.
        - Removed wait_until_locking_thread_dies() which is not needed now.
     @ sql/backup/be_thread.h
        - Remove lock_states which are not used now.
        - Use only one mutex to synchronize locking thread with the main thread.
        - Inroduce new conditions for improved synchronization protocol.
        - Introduce new members: m_error and do_not_terminate.
        - Removed wait_until_locking_thread_dies() mehtod since now waiting is
          done inside kill_locking_thread().

Approved with changes.

Patch approved. See email for comments.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/91746

2895 Rafal Somla	2009-11-26
      Bug#48930 - Backup: Lock thread race condition for default backup driver.
      
      This patch fixes few issues in the synchronization between the table 
      locking thread used by the default backup driver and the main thread. It 
      basically implements a new synchronization protocol which mimics the one 
      used in the MyISAM backup driver. In particular, it uses a single mutex 
      for synchronization between the two threads.
     @ sql/backup/be_default.cc
        Change the logic for acquiring table locks, because now erros are not
        signaled through locking_thd->lock_state.
        
        Note: As in the original code, if driver is waiting for the locking
        thread it sends empty buffer with OK reply to the kernel. It works, but
        perhaps better would be to send BUSY reply in that case. However, we 
        do not change previous behavior and leave this refactoring for a separate
        patch.
     @ sql/backup/be_snapshot.cc
        Do not use lock_state to infer if all tables have been processed.
        Instead, use explicit information given by get_data() method.
     @ sql/backup/be_thread.cc
        - Changes in the locking thread: 
         -- use m_error for signalling errors, independent from lock_state;
         -- implement new synchronization protocol;
         -- add comments.
        - Changes in kill_locking_thread():
         -- make this method wait for the locking thread after it has been killed,
         -- implement new synchronization protocol;
         -- add extensive comments.
        - Removed wait_until_locking_thread_dies() which is not needed now.
     @ sql/backup/be_thread.h
        - Remove lock_states which are not used now.
        - Use only one mutex to synchronize locking thread with the main thread.
        - Introduce new conditions for improved synchronization protocol.
        - Introduce new members: m_error and do_not_terminate.
        - Removed wait_until_locking_thread_dies() method since now waiting is
          done inside kill_locking_thread().

Pushed to mysql-6.0-backup tree.
revid:rafal.somla@sun.com-20091126090258-038073fqwz1t01oh

Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)

Noted in 6.0.14 changelog.

For BACKUP DATABASE, there was a race condition between the backup
thread and the locking thread for the default backup driver.