| Bug #40043 | Maria: waiting_threads-t unit test hangs in "kill strategy: LATEST" on Windows | ||
|---|---|---|---|
| Submitted: | 15 Oct 2008 12:11 | Modified: | 8 Jan 2009 10:42 |
| Reporter: | Guilhem Bichot | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Maria storage engine | Severity: | S3 (Non-critical) |
| Version: | 5.1-maria | OS: | Windows |
| Assigned to: | Sergei Golubchik | CPU Architecture: | Any |
[15 Oct 2008 12:11]
Guilhem Bichot
[15 Oct 2008 12:14]
Guilhem Bichot
I was able to repeat this problem on my machine; I tested this patch from Sergei, which lets read-lockers go before write-lockers:
=== modified file 'mysys/thr_rwlock.c'
--- mysys/thr_rwlock.c 2006-12-23 19:17:15 +0000
+++ mysys/thr_rwlock.c 2008-10-15 09:53:24 +0000
@@ -89,7 +89,7 @@
pthread_mutex_lock(&rwp->lock);
/* active or queued writers */
- while (( rwp->state < 0 ) || rwp->waiters)
+ while ( rwp->state < 0 )
pthread_cond_wait( &rwp->readers, &rwp->lock);
rwp->state++;
@@ -101,7 +101,7 @@
{
int res;
pthread_mutex_lock(&rwp->lock);
- if ((rwp->state < 0 ) || rwp->waiters)
+ if (rwp->state < 0 )
res= EBUSY; /* Can't get lock */
else
{
But it didn't change anything.
[15 Oct 2008 12:23]
Guilhem Bichot
The main thread is waiting for termination of the test, and all other 29 threads are either: - trying to get the "lock" global mutex (defined in waiting_threads-t.c) in one of the pthread_mutex_lock in waiting_threads-t.c - or trying to get this same mutex, at my_wincond.c:142 , in pthread_cond_timedwait() inside wt_thd_cond_timedwait() - or (one single thread in this case) at my_wincond.c:116, waiting for something, in pthread_cond_timedwait() inside wt_thd_cond_timedwait()
[15 Oct 2008 12:25]
Guilhem Bichot
No problem in 6.0-main and 6.0-maria, relevant code of 5.1-maria hasn't been merged there yet.
[15 Oct 2008 17:15]
Guilhem Bichot
After commenting out the line at my_wincond.c:116 (and keeping Serg's patch), LATEST finishes after 120 seconds (5 seconds under Linux). I haven't tested comment-out without Serg's patch.
[15 Oct 2008 18:10]
Guilhem Bichot
Results from Windows (with code changes as described in previous post):
# kill strategy: LATEST
# Testing waiting_threads with 30 threads, 3000 iterations...
ok 9 - tested waiting_threads in 121.726 secs (0)
# deadlock cycles of length 1: 2866 9 34.60 %
# deadlock cycles of length 2: 2196 15 61.20 %
# deadlock cycles of length 3: 1519 8 79.58 %
# deadlock cycles of length 4: 920 5 90.71 %
# deadlock cycles of length 5: 478 3 96.50 %
# deadlock cycles of length 6: 239 0 99.37 %
# deadlock cycles of length 7: 0 31 99.75 %
# deadlock cycles of length 8: 0 9 99.86 %
# deadlock cycles of length 9: 0 2 99.88 %
# deadlock cycles of length 10: 0 1 99.89 %
# deadlock cycles of length 12: 0 2 99.92 %
# deadlock cycles of length 13: 0 1 99.93 %
# deadlock cycles of length 15: 0 2 99.95 %
# deadlock cycles of length 16: 0 4 100.00 %
# depth exceeded: 16245 87
# deadlock waits up to 22 us: 0
# deadlock waits up to 49 us: 0
# deadlock waits up to 106 us: 0
# deadlock waits up to 232 us: 0
# deadlock waits up to 507 us: 0
# deadlock waits up to 1105 us: 0
# deadlock waits up to 2408 us: 0
# deadlock waits up to 5247 us: 0
# deadlock waits up to 11433 us: 0
# deadlock waits up to 24908 us: 0
# deadlock waits up to 54265 us: 0
# deadlock waits up to 118224 us: 0
# timed out: 60604
# successes: 21086
The "waits up" stats look wrong ("0").
I'll post Linux output (same machine) later.
[15 Oct 2008 19:17]
Guilhem Bichot
Here's Linux, same machine as previous post:
# kill strategy: LATEST
# Testing waiting_threads with 30 threads, 3000 iterations...
ok 9 - tested waiting_threads in 4.54795 secs (0)
# deadlock cycles of length 1: 6868 6 19.25 %
# deadlock cycles of length 2: 7602 3 40.55 %
# deadlock cycles of length 3: 7236 7 60.83 %
# deadlock cycles of length 4: 5993 6 77.63 %
# deadlock cycles of length 5: 4412 1 89.99 %
# deadlock cycles of length 6: 3388 2 99.48 %
# deadlock cycles of length 7: 0 103 99.77 %
# deadlock cycles of length 8: 0 43 99.89 %
# deadlock cycles of length 9: 0 17 99.94 %
# deadlock cycles of length 10: 0 8 99.96 %
# deadlock cycles of length 11: 0 3 99.97 %
# deadlock cycles of length 12: 0 3 99.97 %
# deadlock cycles of length 13: 0 2 99.98 %
# deadlock cycles of length 14: 0 2 99.99 %
# deadlock cycles of length 15: 0 2 99.99 %
# deadlock cycles of length 16: 0 3 100.00 %
# depth exceeded: 38766 182
# deadlock waits up to 1 us: 460
# deadlock waits up to 2 us: 4
# deadlock waits up to 4 us: 11
# deadlock waits up to 10 us: 1
# deadlock waits up to 22 us: 55
# deadlock waits up to 49 us: 284
# deadlock waits up to 106 us: 585
# deadlock waits up to 232 us: 1721
# deadlock waits up to 507 us: 5298
# deadlock waits up to 1105 us: 19938
# deadlock waits up to 2408 us: 18588
# deadlock waits up to 5247 us: 246
# deadlock waits up to 11433 us: 1166
# deadlock waits up to 24908 us: 5
# timed out: 6139
# successes: 48151
Note that it has much less timeouts, much more successes, twice more "depth exceeded", and "waits up" stats are not looking stupid (unlike zeros under Windows).
By the way, I wonder how the "waits up" can be zeros in Windows, given that the code which prints them is:
for (cnt=0; cnt < WT_WAIT_STATS; cnt++)
if (wt_wait_stats[cnt]>0)
diag("deadlock waits up to %7llu us: %5u",
wt_wait_table[cnt], wt_wait_stats[cnt]);
so is supposed to diag() only if the statistic is >0.
[16 Oct 2008 20:02]
Guilhem Bichot
Note that the line which I commented out, my_wincond.c, is from a relatively new implementation: it was introduced, together with a significant rewrite of my_wincond.c, in revision sp1r-Kristofer.Pettersson@naruto.-20070215130821-52591 . It may be a correct implementation, or we may have found a bug in it, which is not impossible as this unit test uses rwlocks a lot (more than the average Server code).
[21 Oct 2008 19:26]
Sergei Golubchik
committed: http://lists.mysql.com/maria/246 approved on irc
[7 Nov 2008 15:25]
Guilhem Bichot
fixed in 6.0.8
[8 Jan 2009 10:42]
MC Brown
Test case failure. No documentation needed.
