Bug #51105 MDL deadlock in rqg_mdl_stability test on Windows
Submitted: 11 Feb 2010 14:59 Modified: 7 Mar 2010 1:06
Reporter: John Embretsen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Locking Severity:S2 (Serious)
Version:mysql-next-4284 OS:Windows
Assigned to: Dmitry Lenev CPU Architecture:Any
Tags: pushbuild, rqg_pb2, test failure

[11 Feb 2010 14:59] John Embretsen
Description:
The test 'rqg_mdl_stability' (10 concurrent clients) fails on the Windows platform in Pushbuild (mysql-next-4284 branch) with what seems to be a deadlock among MDL threads. The Random Query Generator reports:

"10 stalled queries detected, declaring deadlock"

A quick look at the test logs indicate the following unique MDL back traces:

cond_wait -> my_rw_wrlock -> MDL_lock::remove_ticket
cond_wait -> my_rw_wrlock -> MDL_lock::find_deadlock
cond_wait -> my_rw_wrlock -> MDL_context::stop_waiting -> acquire_lock
cond_wait -> my_rw_wrlock -> MDL_map::move_from_hash_to_lock_mutex

Test log with stack traces and other info will be attached shortly.

This deadlock currently occurs on Windows only (tested on Win 2003 32-bit), and is different from Bug#50998 (Linux deadlock, same test & source, now fixed).

How to repeat:
environment variable N4284.

Obtain a recent version of the Random Query Generator, e.g. by:
bzr branch lp:randgen

cd randgen

Run:

perl ./runall.pl \ 
--grammar=conf/metadata_stability.yy \ 
--gendata=conf/metadata_stability.zz \ 
--validator=SelectStability,QueryProperties \ 
--engine=Innodb \ 
--mysqld=--loose-innodb-lock-wait-timeout=5 \ 
--mysqld=--table-lock-wait-timeout=5 \ 
--mysqld=--loose-skip-safemalloc \ 
--mysqld=--innodb \ 
--mysqld=--default-storage-engine=Innodb \ 
--mysqld=--transaction-isolation=SERIALIZABLE \ 
--mysqld=--innodb-flush-log-at-trx-commit=2 \ 
--mysqld=--table-lock-wait-timeout=1 \ 
--mysqld=--innodb-lock-wait-timeout=1 \ 
--mysqld=--log-output=file \ 
--queries=1M \ 
--duration=600 \ 
--reporters=Deadlock,ErrorLog,Backtrace,Shutdown \ 
--basedir=$N4284
[11 Feb 2010 15:07] John Embretsen
RQG test output from failing test on Windows x86, including stack traces from debugger.

Attachment: bug51105_log_with_stacktraces.txt (text/plain), 60.81 KiB.

[11 Feb 2010 19:01] Vladislav Vaintroub
Also notice the buggy symbol search path in the output: 
G:\pb2\test\sb_1-1352546-1265853511.19\mysql-5.5.99-m3-win-x86-test\mysql-test..\sql\Debug

something is not quite working in Philip's stackdumper, (mysql-test.. does not exist) so you get no line numbers which is pitty.
[19 Feb 2010 8:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/100826

3104 Dmitry Lenev	2010-02-19
      Fix for bug #51105 "MDL deadlock in rqg_mdl_stability test
      on Windows".
      
      On platforms where read-write lock implementation does not
      prefer readers by default (Windows, Solaris) server might
      have deadlocked while detecting MDL deadlock.
      
      MDL deadlock detector relies on the fact that read-write
      locks which are used in its implementation prefer readers
      (see new comment for MDL_lock::m_rwlock for details).
      So far MDL code assumed that default implementation of
      read/write locks for the system has this property.
      Indeed, this turned out ot be wrong, for example, for
      Windows or Solaris. Thus MDL deadlock detector might have
      deadlocked on these systems.
      
      This fix simply adds portable implementation of read/write
      lock which prefer readers and changes MDL code to use this
      new type of synchronization primitive.
      
      No test case is added as existing rqg_mdl_stability test can
      serve as one.
      
      Question for reviewer is marked by QQ.
     @ configure.in
        Check for presence of pthread_rwlockattr_setkind_np to be
        able to determine if system natively supports read-write
        locks for which we can specify if readers or writers should
        be preferred.
     @ include/my_pthread.h
        Added support for portable read-write locks which prefer
        readers.
        To do so extended existing my_rw_lock_t implementation to
        support selection of whom to prefer depending on a flag.
     @ mysys/thr_rwlock.c
        Extended existing my_rw_lock_t implementation to support
        selection of whom to prefer depending on a flag.
        Added rw_pr_init() function implementing initialization of
        read-write locks preferring readers on systems which support
        them natively (e.g. Linux/NPTL).
     @ sql/mdl.cc
        Use portable read-write locks which prefer readers instead of
        relying on that system implementation of read-write locks has
        this property (this was true for Linux/NPTL but was false,
        for example, for Windows and Solaris).
        Added comment explaining why preferring readers is important
        for MDL deadlock detector (thanks to Serg for example!).
     @ sql/mdl.h
        Use portable read-write locks which prefer readers instead of
        relying on that system implementation of read-write locks has
        this property (this was true for Linux/NPTL but was false,
        for example, for Windows and Solaris).
[28 Feb 2010 4:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/101767

3115 Dmitry Lenev	2010-02-28
      Fix for bug #51105 "MDL deadlock in rqg_mdl_stability test
      on Windows".
      
      On platforms where read-write lock implementation does not
      prefer readers by default (Windows, Solaris) server might
      have deadlocked while detecting MDL deadlock.
      
      MDL deadlock detector relies on the fact that read-write
      locks which are used in its implementation prefer readers
      (see new comment for MDL_lock::m_rwlock for details).
      So far MDL code assumed that default implementation of
      read/write locks for the system has this property.
      Indeed, this turned out ot be wrong, for example, for
      Windows or Solaris. Thus MDL deadlock detector might have
      deadlocked on these systems.
      
      This fix simply adds portable implementation of read/write
      lock which prefer readers and changes MDL code to use this
      new type of synchronization primitive.
      
      No test case is added as existing rqg_mdl_stability test can
      serve as one.
     @ config.h.cmake
        Check for presence of pthread_rwlockattr_setkind_np to be
        able to determine if system natively supports read-write
        locks for which we can specify if readers or writers should
        be preferred.
     @ configure.cmake
        Check for presence of pthread_rwlockattr_setkind_np to be
        able to determine if system natively supports read-write
        locks for which we can specify if readers or writers should
        be preferred.
     @ configure.in
        Check for presence of pthread_rwlockattr_setkind_np to be
        able to determine if system natively supports read-write
        locks for which we can specify if readers or writers should
        be preferred.
     @ include/my_pthread.h
        Added support for portable read-write locks which prefer
        readers.
        To do so extended existing my_rw_lock_t implementation to
        support selection of whom to prefer depending on a flag.
     @ mysys/thr_rwlock.c
        Extended existing my_rw_lock_t implementation to support
        selection of whom to prefer depending on a flag.
        Added rw_pr_init() function implementing initialization of
        read-write locks preferring readers.
     @ sql/mdl.cc
        Use portable read-write locks which prefer readers instead of
        relying on that system implementation of read-write locks has
        this property (this was true for Linux/NPTL but was false,
        for example, for Windows and Solaris).
        Added comment explaining why preferring readers is important
        for MDL deadlock detector (thanks to Serg for example!).
     @ sql/mdl.h
        Use portable read-write locks which prefer readers instead of
        relying on that system implementation of read-write locks has
        this property (this was true for Linux/NPTL but was false,
        for example, for Windows and Solaris).
[28 Feb 2010 4:53] Dmitry Lenev
Fix for this bug was pushed into mysql-next-4284 tree.
[6 Mar 2010 10:30] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100306102742-yw9zzgw9ac5r65m5) (version source revid:bar@mysql.com-20100305074327-h09o5lw290s04lcf) (merge vers: 6.0.14-alpha) (pib:16)
[6 Mar 2010 10:31] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100306102638-qna09hbjb5gm940h) (version source revid:alik@sun.com-20100304153932-9hajxhhyanqbckmu) (pib:16)
[6 Mar 2010 10:55] Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:alik@sun.com-20100304153932-9hajxhhyanqbckmu) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:06] Paul DuBois
Bug reported against internal tree. No changelog entry needed.