Bug #40043 Maria: waiting_threads-t unit test hangs in "kill strategy: LATEST" on Windows
Submitted: 15 Oct 2008 12:11 Modified: 8 Jan 2009 10:42
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Maria storage engine Severity:S3 (Non-critical)
Version:5.1-maria OS:Windows
Assigned to: Sergei Golubchik CPU Architecture:Any

[15 Oct 2008 12:11] Guilhem Bichot
Description:
Windows 32 bit, debug build, running waiting_threads-t: after 1.5 hours it is still at
# kill strategy: LATEST
# Testing waiting_threads with 30 threads, 3000 iterations...

and is at 0% CPU.

Under Linux, it moves to "kill strategy: RANDOM" after 5 seconds.

How to repeat:
run unittest/mysys/debug/waiting_threads-t.exe
[15 Oct 2008 12:14] Guilhem Bichot
I was able to repeat this problem on my machine; I tested this patch from Sergei, which lets read-lockers go before write-lockers:
=== modified file 'mysys/thr_rwlock.c'
--- mysys/thr_rwlock.c  2006-12-23 19:17:15 +0000
+++ mysys/thr_rwlock.c  2008-10-15 09:53:24 +0000
@@ -89,7 +89,7 @@
   pthread_mutex_lock(&rwp->lock);
 
   /* active or queued writers */
-  while (( rwp->state < 0 ) || rwp->waiters)
+  while ( rwp->state < 0 )
     pthread_cond_wait( &rwp->readers, &rwp->lock);
 
   rwp->state++;
@@ -101,7 +101,7 @@
 {
   int res;
   pthread_mutex_lock(&rwp->lock);
-  if ((rwp->state < 0 ) || rwp->waiters)
+  if (rwp->state < 0 )
     res= EBUSY;                                        /* Can't get lock */
   else
   {

But it didn't change anything.
[15 Oct 2008 12:23] Guilhem Bichot
The main thread is waiting for termination of the test, and all other 29 threads are either:
- trying to get the "lock" global mutex (defined in waiting_threads-t.c) in one of the pthread_mutex_lock in waiting_threads-t.c
- or trying to get this same mutex, at my_wincond.c:142 , in pthread_cond_timedwait() inside wt_thd_cond_timedwait()
- or (one single thread in this case) at my_wincond.c:116, waiting for something, in pthread_cond_timedwait() inside wt_thd_cond_timedwait()
[15 Oct 2008 12:25] Guilhem Bichot
No problem in 6.0-main and 6.0-maria, relevant code of 5.1-maria hasn't been merged there yet.
[15 Oct 2008 17:15] Guilhem Bichot
After commenting out the line at my_wincond.c:116 (and keeping Serg's patch),
LATEST finishes after 120 seconds (5 seconds under Linux).
I haven't tested comment-out without Serg's patch.
[15 Oct 2008 18:10] Guilhem Bichot
Results from Windows (with code changes as described in previous post):

# kill strategy: LATEST
# Testing waiting_threads with 30 threads, 3000 iterations...
ok 9 - tested waiting_threads in 121.726 secs (0)
# deadlock cycles of length  1: 2866    9    34.60 %
# deadlock cycles of length  2: 2196   15    61.20 %
# deadlock cycles of length  3: 1519    8    79.58 %
# deadlock cycles of length  4:  920    5    90.71 %
# deadlock cycles of length  5:  478    3    96.50 %
# deadlock cycles of length  6:  239    0    99.37 %
# deadlock cycles of length  7:    0   31    99.75 %
# deadlock cycles of length  8:    0    9    99.86 %
# deadlock cycles of length  9:    0    2    99.88 %
# deadlock cycles of length 10:    0    1    99.89 %
# deadlock cycles of length 12:    0    2    99.92 %
# deadlock cycles of length 13:    0    1    99.93 %
# deadlock cycles of length 15:    0    2    99.95 %
# deadlock cycles of length 16:    0    4   100.00 %
# depth exceeded: 16245 87
# deadlock waits up to      22 us:     0
# deadlock waits up to      49 us:     0
# deadlock waits up to     106 us:     0
# deadlock waits up to     232 us:     0
# deadlock waits up to     507 us:     0
# deadlock waits up to    1105 us:     0
# deadlock waits up to    2408 us:     0
# deadlock waits up to    5247 us:     0
# deadlock waits up to   11433 us:     0
# deadlock waits up to   24908 us:     0
# deadlock waits up to   54265 us:     0
# deadlock waits up to  118224 us:     0
# timed out: 60604
# successes: 21086

The "waits up" stats look wrong ("0").
I'll post Linux output (same machine) later.
[15 Oct 2008 19:17] Guilhem Bichot
Here's Linux, same machine as previous post:

# kill strategy: LATEST
# Testing waiting_threads with 30 threads, 3000 iterations...
ok 9 - tested waiting_threads in 4.54795 secs (0)
# deadlock cycles of length  1: 6868    6    19.25 %
# deadlock cycles of length  2: 7602    3    40.55 %
# deadlock cycles of length  3: 7236    7    60.83 %
# deadlock cycles of length  4: 5993    6    77.63 %
# deadlock cycles of length  5: 4412    1    89.99 %
# deadlock cycles of length  6: 3388    2    99.48 %
# deadlock cycles of length  7:    0  103    99.77 %
# deadlock cycles of length  8:    0   43    99.89 %
# deadlock cycles of length  9:    0   17    99.94 %
# deadlock cycles of length 10:    0    8    99.96 %
# deadlock cycles of length 11:    0    3    99.97 %
# deadlock cycles of length 12:    0    3    99.97 %
# deadlock cycles of length 13:    0    2    99.98 %
# deadlock cycles of length 14:    0    2    99.99 %
# deadlock cycles of length 15:    0    2    99.99 %
# deadlock cycles of length 16:    0    3   100.00 %
# depth exceeded: 38766 182
# deadlock waits up to       1 us:   460
# deadlock waits up to       2 us:     4
# deadlock waits up to       4 us:    11
# deadlock waits up to      10 us:     1
# deadlock waits up to      22 us:    55
# deadlock waits up to      49 us:   284
# deadlock waits up to     106 us:   585
# deadlock waits up to     232 us:  1721
# deadlock waits up to     507 us:  5298
# deadlock waits up to    1105 us: 19938
# deadlock waits up to    2408 us: 18588
# deadlock waits up to    5247 us:   246
# deadlock waits up to   11433 us:  1166
# deadlock waits up to   24908 us:     5
# timed out: 6139
# successes: 48151

Note that it has much less timeouts, much more successes, twice more "depth exceeded", and "waits up" stats are not looking stupid (unlike zeros under Windows).
By the way, I wonder how the "waits up" can be zeros in Windows, given that the code which prints them is:
  for (cnt=0; cnt < WT_WAIT_STATS; cnt++)
    if (wt_wait_stats[cnt]>0)
      diag("deadlock waits up to %7llu us: %5u",
           wt_wait_table[cnt], wt_wait_stats[cnt]);
so is supposed to diag() only if the statistic is >0.
[16 Oct 2008 20:02] Guilhem Bichot
Note that the line which I commented out, my_wincond.c, is from a relatively new implementation: it was introduced, together with a significant rewrite of my_wincond.c, in revision sp1r-Kristofer.Pettersson@naruto.-20070215130821-52591 .
It may be a correct implementation, or we may have found a bug in it, which is not impossible as this unit test uses rwlocks a lot (more than the average Server code).
[21 Oct 2008 19:26] Sergei Golubchik
committed: http://lists.mysql.com/maria/246
approved on irc
[7 Nov 2008 15:25] Guilhem Bichot
fixed in 6.0.8
[8 Jan 2009 10:42] MC Brown
Test case failure. No documentation needed.