Bug #33296 Error deleting mutex in locking thread
Submitted: 17 Dec 2007 18:58 Modified: 26 Feb 2008 2:15
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Chuck Bell CPU Architecture:Any

[17 Dec 2007 18:58] Chuck Bell
Description:
It is possible under the right circumstances for a race condition to occur in the shutdown of the locking thread.

In the destructor Locking_thread_st::~Locking_thread_st(), the code inspects the lock_state attribute without taking a mutex lock.

>  if (lock_state != LOCK_DONE)

If the locking thread code itself, pthread_handler_t backup_thread_for_locking(void *arg), executes to the point where it attempts to terminate -- for instance in this code:

>  pthread_mutex_lock(&locking_thd->THR_LOCK_caller);
>  net_end(&thd->net);
>  my_thread_end();
>  delete thd;
>  locking_thd->lock_thd= NULL;
>  if (locking_thd->lock_state != LOCK_ERROR)
>    locking_thd->lock_state= LOCK_DONE;

...but is preempted before setting the lock_state variable, the destructor of the locking thread determines lock_state is not LOCK_DONE and then executes the following code:

>  pthread_mutex_destroy(&THR_LOCK_thread);
>  pthread_cond_destroy(&COND_thread_wait);
>  pthread_mutex_destroy(&THR_LOCK_caller);
>  pthread_cond_destroy(&COND_caller_wait);

...which then attempts to destroy a still-locked mutex.

How to repeat:
Run the test suite in a loop such as: 

(while true; do ./mtr --do-test=backup || exit 1; done)

Suggested fix:
Use a mutex to inspect the lock_state variable in the destructor of the locking thread structure, Locking_thread_st. 

===== sql/backup/be_thread.cc 1.3 vs edited =====
--- 1.3/sql/backup/be_thread.cc 2007-12-04 12:38:06 -05:00
+++ edited/sql/backup/be_thread.cc      2007-12-17 10:33:15 -05:00
@@ -220,23 +220,28 @@
 */
 Locking_thread_st::~Locking_thread_st()
 {
+  my_bool signalled= FALSE;
+
   /*
     If the locking thread is not finished, we need to wait until
     it is finished so that we can destroy the mutexes safely knowing
     the locking thread won't access them.
   */
+  kill_locking_thread();
+  pthread_mutex_lock(&THR_LOCK_caller);
   if (lock_state != LOCK_DONE)
   {
-    kill_locking_thread();
-    pthread_mutex_lock(&THR_LOCK_caller);
     m_thd->enter_cond(&COND_caller_wait, &THR_LOCK_caller,
                     "Locking thread: waiting until locking thread is done");
     while (lock_state != LOCK_DONE)
       pthread_cond_wait(&COND_caller_wait, &THR_LOCK_caller);
+    signalled= TRUE;
     m_thd->exit_cond("Locking thread: terminating");

     DBUG_PRINT("info",("Locking thread's locking thread terminated"));
   }
+  if (!signalled)
+    pthread_mutex_unlock(&THR_LOCK_caller);

   /*
     Destroy the thread mutexes and cond variables.
[19 Dec 2007 17:43] Chuck Bell
The output from master.err when the problem occurs will resemble the following.
In this case, it was the backup_ddl_blocker test.

[...]
mysql_backup: backup: == breakpoint on 'backup_done' ==
safe_mutex: Trying to destroy a mutex that was locked at be_thread.cc, line 200
at be_thread.cc, line 263
071219 19:34:16 - mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=8
max_threads=151
threads_connected=7
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 59892 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0xb69c11b8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xb2b41758, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8313960
0xffffe410
0xb7c6e201
0x8897570
0x8944e9b
0x89460eb
0x8948101
0x89482df
0x8939b59
0x8325dad
0x832ec30
0x832f784
0x8330b7b
0x831cb60
0xb7f0946b
0xb7d156de
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/refman/5.1/en/resolve-stack-dump.html
and follow instructions on how to resolve the stack trace.
Resolved stack trace is much more helpful in diagnosing the
problem, so please do resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x8f75a98 = BACKUP DATABASE bup_ddl_blocker_2, bup_ddl_blocker_4 t
o 'bup_ddl_blocker_orig.bak'
thd->thread_id=156
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
[30 Jan 2008 17:06] Chuck Bell
Pushed to mysql-6.0-backup.
[25 Feb 2008 20:19] Bugs System
Pushed into 6.0.5-alpha
[26 Feb 2008 2:15] Paul DuBois
Noted in 6.0.5 changelog.

Shutdown of the Online Backup locking thread was subject to a race condition.
[14 Mar 2008 1:34] Paul DuBois
Correction: No changelog entry needed; this bug did not appear in any released version.