Bug #36929 crash in kill_zombie_dump_threads-> THD::awake() with replication tests
Submitted: 23 May 2008 18:59 Modified: 1 Feb 2009 12:53
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any
Tags: pushbuild, sporadic, test failure

[23 May 2008 18:59] Andrei Elkin
Description:
Happened on mysql-6.0 pb when threadpool feature is compiled in and
activated. The following tests

     rpl.rpl_stm_until 'stmt'
     rpl.rpl_truncate_2myisam 'stmt'
     rpl.rpl_packet 'stmt'

experienced a crash:

      <andrei> #4  <signal handler called>
      <andrei> #5  0x20000000000b38f0 in __pthread_mutex_unlock_usercnt ()
      <andrei>    from /lib/libpthread.so.0
      <andrei> #6  0x40000000003027d0 in THD::awake ()
      <andrei> #7  0x4000000000639d30 in kill_zombie_dump_threads ()
      <andrei> #8  0x4000000000372f60 in dispatch_command ()
      <andrei> #9  0x4000000000373b90 in do_command ()

The reason of the crash appeared to be unguarded resetting THD->mysys_var
by the owner thread whit it's accessible concurrently by the killer thread.

How to repeat:
xref.pl rpl.rpl_stm_until etc, or look at mysql-6.0 pb logs, or
build mysql-6.0 with --with-libevent and mtr on of the tests (can take some number
of repeats).
[23 May 2008 19:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47006

ChangeSet@1.2639, 2008-05-23 22:07:27+03:00, aelkin@mysql1000.dsl.inet.fi +2 -0
  Bug #36929  	crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () at attempt to access concurrently resetable
  by the host thread THD::mysys_var.
  
  Fixed with forcing the host thread to reset only after acquiring LOCK_delete mutex.
[23 May 2008 19:48] Andrei Elkin
There is bug#35714 complaining the same issue. Also having a proto-type of the fix.
[26 May 2008 10:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47043

ChangeSet@1.2639, 2008-05-26 13:10:36+03:00, aelkin@mysql1000.dsl.inet.fi +2 -0
  Bug #36929  	crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () at attempt to access concurrently resetable
  by the host thread THD::mysys_var.
  
  Fixed with forcing the host thread to reset only after acquiring LOCK_delete mutex.
[26 May 2008 11:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47049

ChangeSet@1.2639, 2008-05-26 14:40:26+03:00, aelkin@mysql1000.dsl.inet.fi +2 -0
  Bug #36929  	crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () at attempt to access concurrently resetable
  by the host thread THD::mysys_var.
  
  Fixed with forcing the host thread to reset only after acquiring LOCK_delete mutex.
[27 May 2008 15:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47091

ChangeSet@1.2639, 2008-05-27 18:00:54+03:00, aelkin@mysql1000.dsl.inet.fi +4 -0
  Bug #36929  	crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () at attempt to access concurrently resetable
  by the host thread THD::mysys_var.
  
  The immediate issue is fixed with forcing the host thread to reset only after acquiring
  LOCK_delete mutex.
  The same guarding is deployed to avoid potential race conditions between the host and 
   - the show-process-list executing threads (mysqld_list_processes());
   - shutdown thread (close_connections());
  THD::store_globals() starts acquiring LOCK_delete mutex with this patch, although this is
  a slight overkill: mysys_var could change without mutex protection from NULL to a non NULL
  safely enough for the current logics of the killer (THD::awake).
[29 May 2008 18:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47221

ChangeSet@1.2639, 2008-05-29 21:21:07+03:00, aelkin@mysql1000.dsl.inet.fi +4 -0
  Bug #36929  	crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () at attempt to access concurrently resetable
  by the host thread THD::mysys_var.
  
  The immediate issue is fixed with forcing the host thread to reset only after acquiring
  LOCK_delete mutex.
  The same guarding is deployed to avoid potential race conditions between the host and 
   - the show-process-list executing threads (mysqld_list_processes());
   - shutdown thread (close_connections());
  THD::store_globals() does not acquire LOCK_delete as mysys_var could change without 
  mutex protection from NULL to a non NULL safely for the current logics of threads executing
  THD::awake, close_connections(), mysqld_list_processes().
[29 May 2008 18:34] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47222

ChangeSet@1.2639, 2008-05-29 21:33:35+03:00, aelkin@mysql1000.dsl.inet.fi +4 -0
  Bug #36929 crash in kill_zombie_dump_threads-> THD::awake() with replication tests
  
  There was a crash in THD::awake () when killer attempted to access a concurrently resettable
  by the host thread THD::mysys_var.
  
  The immediate issue is fixed with forcing the host thread to reset only after acquiring
  LOCK_delete mutex.
  The same guarding is deployed to avoid potential race conditions between the host and 
   - the show-process-list executing threads (mysqld_list_processes());
   - shutdown thread (close_connections());
  THD::store_globals() does not acquire LOCK_delete as mysys_var could change without 
  mutex protection from NULL to a non NULL safely for the current logics of threads executing
  THD::awake, close_connections(), mysqld_list_processes().
[7 Jun 2008 9:40] Andrei Elkin
Pushed to the bzr 6.0-rpl.
[12 Jun 2008 8:57] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47770

2669 Andrei Elkin	2008-06-12
      bug#36929 fix post-pushing.
      
      Correcting an assert that does not hold in embedded.
[27 Jun 2008 19:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/48672

2662 Konstantin Osipov	2008-06-27
      Apply a short version of the fix for BUG#36929 before pushing to the
      main tree to fix numerous test failures in pool-of-threads mode.
[25 Aug 2008 21:03] Chuck Bell
Released in 6.0.7
[26 Aug 2008 10:14] Andrei Elkin
Actually, the patch is still in 6.0-rpl and has not been pushed to the main trees.
[27 Aug 2008 1:13] Paul DuBois
Resetting to Patch Queued status.
[30 Jan 2009 13:30] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)
[1 Feb 2009 12:53] Jon Stephens
Documented in the 6.0.10 changelog as follows:

        A slave compiled using --with-libevent and run with
        --thread-handling=pool-of-threads could sometimes crash.
[3 Dec 2009 13:46] Jon Stephens
Also documented in the 5.6.0 changelog. See BUG#48463.
[7 Mar 2010 1:48] Paul DuBois
Moved 5.6.0 changelog entry to 5.5.3.