Bug #49342 backup locking thread: trying to destroy a mutex LOCK_thd_data that was locked
Submitted: 2 Dec 2009 12:16 Modified: 7 Mar 2010 19:00
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:6.0-backup OS:Any
Assigned to: Rafal Somla CPU Architecture:Any

[2 Dec 2009 12:16] Philip Stoev
Description:
When running SHOW PROCESSLIST on a machine that was doing non-concurrent BACKUP/RESTORE and concurrent DDL, mysqld asserted as follows:

safe_mutex: Trying to destroy a mutex LOCK_thd_data that was locked at sql_show.cc, line 1895 at sql_class.cc, line 1105

The backup locking thread is in the following stack trace:

# 14:00:43 #6  0x0000000000a33776 in safe_mutex_destroy (mp=0x2d0d8c8, file=0xb5f75a "sql_class.cc", line=1105) at thr_mutex.c:531
# 14:00:43 #7  0x0000000000658cd5 in ~THD (this=0x2d0cb38) at sql_class.cc:1105
# 14:00:43 #8  0x00000000009d3b7f in myisam_backup::Backup::lock_tables_TL_READ_NO_INSERT (this=0x7f56c00c8010) at myisam_backup_engine.cc:889
# 14:00:43 #9  0x00000000009d3be1 in myisam_backup_separate_thread_for_locking (arg=0x7f56c00c8010) at myisam_backup_engine.cc:901
# 14:00:43 #10 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
# 14:00:43 #11 0x000000315a4e627d in clone () from /lib64/libc.so.6

other backup threads:

# 14:00:45 #0  0x000000315a4dea62 in select () from /lib64/libc.so.6
# 14:00:45 #1  0x0000000000a257a2 in my_sleep (m_seconds=50000) at my_sleep.c:31
# 14:00:45 #2  0x0000000000658804 in THD::awake (this=0x2d4cec8, state_to_set=THD::KILL_CONNECTION) at sql_class.cc:1265
# 14:00:45 #3  0x0000000000ae5fa5 in Locking_thread_st::kill_locking_thread (this=0x7f56c00bbe90) at be_thread.cc:346
# 14:00:45 #4  0x0000000000ae40c1 in ~Backup (this=0x7f56c004a440) at be_default.cc:158
# 14:00:45 #5  0x0000000000ae4aad in default_backup::Backup::free (this=0x7f56c004a440) at ../../sql/backup/be_default.h:81
# 14:00:45 #6  0x0000000000adfa04 in ~Backup_pump (this=0x7f56c00c82a0) at data_backup.cc:1151
# 14:00:45 #7  0x0000000000ae1a21 in ~Pump (this=0x7f56c00c82a0) at data_backup.cc:311
# 14:00:45 #8  0x0000000000adfdff in backup::Scheduler::remove_pump (this=0x7f56c4891030, p=@0x7f56c4890fc0) at data_backup.cc:1020
# 14:00:45 #9  0x0000000000ae0581 in backup::Scheduler::step (this=0x7f56c4891030) at data_backup.cc:895
# 14:00:45 #10 0x0000000000ae1463 in backup::write_table_data (thd=0x28dfe98, info=@0x7f56c0096c70, s=@0x7f56c00e5590) at data_backup.cc:731
# 14:00:45 #11 0x0000000000ad6fbb in Backup_restore_ctx::do_backup (this=0x7f56c4891220) at kernel.cc:1395
# 14:00:45 #12 0x0000000000ad9bcc in execute_backup_command (thd=0x28dfe98, lex=0x28e1768, backupdir=0x7f56c48921a0, overwrite=false, skip_gap_event=false)
# 14:00:45     at kernel.cc:269
# 14:00:45 #13 0x000000000068549f in mysql_execute_command (thd=0x28dfe98) at sql_parse.cc:2482
# 14:00:45 #14 0x000000000068d0cf in mysql_parse (thd=0x28dfe98,
# 14:00:45     inBuf=0x2922c40 "BACKUP DATABASE testdb_N TO '/tmp/gentest31474-1259755240.77996-0.638194958385281.tmp'", length=86, found_semicolon=0x7f56c4892f00)
# 14:00:45     at sql_parse.cc:5985
# 14:00:45 #15 0x000000000068dd38 in dispatch_command (command=COM_QUERY, thd=0x28dfe98,
# 14:00:45     packet=0x2912d79 "BACKUP DATABASE testdb_N TO '/tmp/gentest31474-1259755240.77996-0.638194958385281.tmp' ", packet_length=87) at sql_parse.cc:1078
# 14:00:45 #16 0x000000000068f2aa in do_command (thd=0x28dfe98) at sql_parse.cc:760
# 14:00:45 #17 0x000000000067bff8 in handle_one_connection (arg=0x28dfe98) at sql_connect.cc:1164
# 14:00:45 #18 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
# 14:00:45 #19 0x000000315a4e627d in clone () from /lib64/libc.so.6

How to repeat:
If this is repeatable, a test case will be provided.
[2 Dec 2009 12:17] Philip Stoev
c:\docs\mysql\bug49342.stacks

Attachment: bug49342.stacks.txt (text/plain), 35.89 KiB.

[2 Dec 2009 12:27] Philip Stoev
core and binary 

http://mysql-systemqa.s3.amazonaws.com/var-bug49342.zip

source:

revision-id: rafal.somla@sun.com-20091202105014-u8ei828qwpdkjs03
date: 2009-12-02 11:50:14 +0100
build-date: 2009-12-02 14:17:44 +0200
revno: 2901
branch-nick: mysql-6.0-backup
[2 Dec 2009 12:51] Rafal Somla
Note: whoever solves this problem should also check if similar issue is not present in the locking thread of the default backup driver (sql/backup/be_default.cc).
[8 Dec 2009 8:38] Rafal Somla
I was able to reproduce similar problem related to bad synchronization between table locking thread and SHOW PROCESSLIST command. To do this I need to introduce few new synchronization points:

------------------------------------------------------------------------------
=== modified file 'sql/sql_show.cc'
--- sql/sql_show.cc	2009-10-25 13:41:27 +0000
+++ sql/sql_show.cc	2009-12-08 08:19:21 +0000
@@ -32,6 +32,8 @@
 #endif
 #include <my_dir.h>
 
+#include "debug_sync.h"
+
 #define STR_OR_NIL(S) ((S) ? (S) : "<nil>")
 
 #ifdef WITH_PARTITION_STORAGE_ENGINE
@@ -1860,6 +1862,8 @@ void mysqld_list_processes(THD *thd,cons
                             Protocol::SEND_NUM_ROWS | Protocol::SEND_EOF))
     DBUG_VOID_RETURN;
 
+  DEBUG_SYNC(current_thd,"sql_show_begin");
+
   pthread_mutex_lock(&LOCK_thread_count); // For unlink from list
   if (!thd->killed)
   {
@@ -1900,6 +1904,9 @@ void mysqld_list_processes(THD *thd,cons
         if (mysys_var)
           pthread_mutex_unlock(&mysys_var->mutex);
         pthread_mutex_unlock(&tmp->LOCK_thd_data);
+
+        DEBUG_SYNC(current_thd,"sql_show_locked");
+
         thd_info->start_time= tmp->start_time;
         thd_info->query=0;
         /* Lock THD mutex that protects its data when looking at it. */

=== modified file 'storage/myisam/myisam_backup_engine.cc'
--- storage/myisam/myisam_backup_engine.cc	2009-12-02 22:58:41 +0000
+++ storage/myisam/myisam_backup_engine.cc	2009-12-08 08:02:20 +0000
@@ -446,6 +446,8 @@ retry:
     */
     cannot_delete_lock_thd= TRUE;
     pthread_mutex_unlock(&THR_LOCK_myisam);
+
+    DEBUG_SYNC(current_thd,"myisam_before_kill_lock_thd");
     /*
       So now lock_thd cannot be destroyed.
       We kill the thread (which will in particular work if it is waiting for
@@ -454,6 +456,9 @@ retry:
     pthread_mutex_lock(&lock_thd->LOCK_thd_data);
     lock_thd->awake(THD::KILL_CONNECTION);
     pthread_mutex_unlock(&lock_thd->LOCK_thd_data);
+
+    DEBUG_SYNC(current_thd,"myisam_after_kill_lock_thd");
+
     /* won't look at lock_thd anymore, allow its deletion */
     pthread_mutex_lock(&THR_LOCK_myisam);
     cannot_delete_lock_thd= FALSE;
------------------------------------------------------------------------------

Then I run the following test case:

------------------------------------------------------------------------------
CREATE DATABASE db1;
USE db1;
CREATE TABLE t1 (a int) ENGINE=myisam;
INSERT INTO t1 VALUES (1);

SET DEBUG_SYNC= 'myisam_before_kill_lock_thd SIGNAL kill WAIT_FOR cont_backup';
SET DEBUG_SYNC= 'myisam_after_kill_lock_thd  SIGNAL cont_show WAIT_FOR finish_backup';
SET DEBUG_SYNC= 'backup_restore_done SIGNAL finish_show';

send BACKUP DATABASE db1 TO 'db1.bkp';

  --connect (con1,localhost,root,,)
  --echo # con1
  SET DEBUG_SYNC= 'now WAIT_FOR kill';
  SET DEBUG_SYNC= 'sql_show_begin SIGNAL cont_backup WAIT_FOR cont_show';
  SET DEBUG_SYNC= 'sql_show_locked SIGNAL finish_backup WAIT_FOR finish_show';
  send SHOW PROCESSLIST;

--connection default
reap;
SET DEBUG_SYNC= 'now SIGNAL finish_show';

DROP DATABASE db1;
--exit
------------------------------------------------------------------------------

It creates a situation where killed locking thread terminates while SHOW PROCESSLIST is in the middle of execution holding LOCK_thread_count. Since the locking thread is not acquiring LOCK_thread_count, it can delete the THD instance which SHOW PROCESSLIST is accessing at the moment. This leads to a  crash in SHOW PROCESSLIST.
[8 Dec 2009 8:43] Rafal Somla
REFINED PROBLEM DESCRIPTION
---------------------------
The table locking thread of the MyISAM backup driver is not holding LOCK_thread_count while deleting the THD instance (storage/myisam/myisam_backup_engine.cc:918). This leads to a crash if a concurrent thread is trying to access this instance thinking that it is protected by LOCK_thread_count - for example the SHOW PROCESSLIST command.

The same problem is present in the locking thread for the default backup driver (sql/backup/be_thread.cc:222).

PROPOSED SOLUTION
-----------------
In table locking threads, guard deletion of the THD instance with LOCK_thread_count.
[8 Dec 2009 9:28] Ingo Strüwing
Proposed solution sounds good.
[8 Dec 2009 11:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93153

2904 Rafal Somla	2009-12-08
      Bug #49342 backup locking thread: trying to destroy a mutex 
                 LOCK_thd_data that was locked
      
      The problem was that table locking thread of the MyISAM backup
      driver was deleting its THD instance without acquiring 
      LOCK_thread_count mutex. This could lead to a situation when 
      SHOW PROCESSLIST statement is accessing the THD instance being 
      deleted which resulted in crash.
      
      Fixed by protecting deletion of THD instance with 
      LOCK_thread_count in table locking threads for both MyISAM and 
      default backup drivers. A test script is added to check that
      the crash does not happen now.
     @ mysql-test/suite/backup/t/backup_thd_delete.test
        Test script which sets up a situation in which SHOW PROCESSLIST
        was crashing before this fix.
     @ sql/backup/be_thread.cc
        Move "delete thd" so that it is protected with LOCK_thread_count.
     @ sql/sql_show.cc
        Added debug sync point for backup_thd_delete test.
     @ storage/myisam/myisam_backup_engine.cc
        - Added debug sync point for backup_thd_delete test.
        - Protect "delete thd" with LOCK_thread_count.
[8 Dec 2009 15:32] Chuck Bell
Approved pending changes.
[9 Dec 2009 17:37] Ingo Strüwing
Approved pending change. Please see email.
[10 Dec 2009 8:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93446

2904 Rafal Somla	2009-12-10
      BUG#49342 backup locking thread: trying to destroy a mutex 
                LOCK_thd_data that was locked
      
      The problem was that table locking thread of the MyISAM backup
      driver was deleting its THD instance without acquiring 
      LOCK_thread_count mutex. This could lead to a situation when 
      SHOW PROCESSLIST statement is accessing the THD instance being 
      deleted which resulted in crash.
      
      Fixed by protecting deletion of THD instance with 
      LOCK_thread_count in table locking threads for both MyISAM and 
      default backup drivers. A test script is added to check that
      the crash does not happen now.
     @ mysql-test/suite/backup/t/backup_thd_delete.test
        Test script which sets up a situation in which SHOW PROCESSLIST
        was crashing before this fix.
     @ sql/backup/be_thread.cc
        Move "delete thd" so that it is protected with LOCK_thread_count.
     @ sql/sql_show.cc
        Added debug sync point for backup_thd_delete test.
     @ storage/myisam/myisam_backup_engine.cc
        - Added debug sync point for backup_thd_delete test.
        - Protect "delete thd" with LOCK_thread_count.
[10 Dec 2009 8:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/93447

2905 Rafal Somla	2009-12-10
      BUG#49342 backup locking thread: trying to destroy a mutex 
                LOCK_thd_data that was locked
      
      The problem was that table locking thread of the MyISAM backup
      driver was deleting its THD instance without acquiring 
      LOCK_thread_count mutex. This could lead to a situation when 
      SHOW PROCESSLIST statement is accessing the THD instance being 
      deleted which resulted in crash.
      
      Fixed by protecting deletion of THD instance with 
      LOCK_thread_count in table locking threads for both MyISAM and 
      default backup drivers. A test script is added to check that
      the crash does not happen now.
     @ mysql-test/suite/backup/t/backup_thd_delete.test
        Test script which sets up a situation in which SHOW PROCESSLIST
        was crashing before this fix.
     @ sql/backup/be_thread.cc
        Move "delete thd" so that it is protected with LOCK_thread_count.
     @ sql/sql_show.cc
        Added debug sync point for backup_thd_delete test.
     @ storage/myisam/myisam_backup_engine.cc
        - Added debug sync point for backup_thd_delete test.
        - Protect "delete thd" with LOCK_thread_count.
[10 Dec 2009 12:45] Rafal Somla
Pushed to mysql-6.0-backup tree.
revid:rafal.somla@sun.com-20091210083820-1rxhn1l7o0ufohyd
[15 Dec 2009 16:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94211

2906 Chuck Bell	2009-12-15
      BUG#49342 backup locking thread: trying to destroy a mutex 
               LOCK_thd_data that was locked
      
      The problem was that table locking thread of the MyISAM backup
      driver was deleting its THD instance without acquiring 
      LOCK_thread_count mutex. This could lead to a situation when 
      SHOW PROCESSLIST statement is accessing the THD instance being 
      deleted which resulted in crash.
      
      Fixed by protecting deletion of THD instance with 
      LOCK_thread_count in table locking threads for both MyISAM and 
      default backup drivers. A test script is added to check that
      the crash does not happen now.
     @ mysql-test/suite/backup/r/backup_thd_delete.result
        New result file.
     @ mysql-test/suite/backup/t/backup_thd_delete.test
        Test script which sets up a situation in which SHOW PROCESSLIST
        was crashing before this fix.
     @ sql/backup/be_thread.cc
        Move "delete thd" so that it is protected with LOCK_thread_count.
     @ sql/sql_show.cc
        Added debug sync point for backup_thd_delete test.
     @ storage/myisam/myisam_backup_engine.cc
        - Added debug sync point for backup_thd_delete test.
        - Protect "delete thd" with LOCK_thread_count.
[4 Jan 2010 17:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95901

3025 Ingo Struewing	2010-01-04
      WL#5101 - MySQL Backup back port - MS09
      Merged revid:charles.bell@sun.com-20091215165144-r6qu4ucaa919yark
        BUG#49342 backup locking thread: trying to destroy a mutex 
                 LOCK_thd_data that was locked
        
        The problem was that table locking thread of the MyISAM backup
        driver was deleting its THD instance without acquiring 
        LOCK_thread_count mutex. This could lead to a situation when 
        SHOW PROCESSLIST statement is accessing the THD instance being 
        deleted which resulted in crash.
        
        Fixed by protecting deletion of THD instance with 
        LOCK_thread_count in table locking threads for both MyISAM and 
        default backup drivers. A test script is added to check that
        the crash does not happen now.
     @ sql/sql_show.cc
        WL#5101 - MySQL Backup back port - MS09
            Added debug sync point for backup_thd_delete test.
     @ storage/myisam/myisam_backup_engine.cc
        WL#5101 - MySQL Backup back port - MS09
            - Added debug sync point for backup_thd_delete test.
            - Protect "delete thd" with LOCK_thread_count.
[20 Feb 2010 9:17] Bugs System
Pushed into 6.0.14-alpha (revid:ingo.struewing@sun.com-20100218152520-s4v1ld76bif06eqn) (version source revid:ingo.struewing@sun.com-20100119103538-wtp5alpz4p2jayl5) (merge vers: 6.0.14-alpha) (pib:16)
[7 Mar 2010 19:00] Paul DuBois
Noted in 6.0.14 changelog.

The MyISAM backup driver did not protect acquire a mutex for a
structure that it was destroying, causing a crash if another thread
attempted to access the structure.