Bug #47768 | pthread_cond_timedwait() is broken on windows | ||
---|---|---|---|
Submitted: | 1 Oct 2009 17:15 | Modified: | 18 Dec 2009 23:45 |
Reporter: | Kristofer Pettersson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: General | Severity: | S2 (Serious) |
Version: | 5.0+ | OS: | Windows |
Assigned to: | Kristofer Pettersson | CPU Architecture: | Any |
[1 Oct 2009 17:15]
Kristofer Pettersson
[1 Oct 2009 17:19]
Kristofer Pettersson
bug47768.cpp
Attachment: pthread_test2.cpp (text/plain), 3.55 KiB.
[1 Oct 2009 17:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85423 3144 Kristofer Pettersson 2009-10-01 Bug#47768 pthread_cond_timedwait() is broken on windows The pthread_cond_wait implementations for windows might dead lock in some rare circumstances. 1) One thread (I) enter a timed wait and at a point in time ends up after mutex unlock and before WaitForMultipleObjects(...) 2) Another thread (II) enters pthread_cond_broadcast. Grabs the mutex and discovers one waiter. It set the broadcast event and closes the broadcast gate then unlocks the mutex. 3) A third thread (III) issues a pthread_cond_signal. It grabs the mutex, discovers one waiter, sets the signal event then unlock the mutex. 4) The first threads (I) enters WaitForMultipleObjects and finds out that the signal object is in a signalled state and exits the wait. 5) Thread (I) grabs the mutex and checks result status. The number of waiters is decreased and becomes equal to 0. The event returned was a signal event so the broadcast gate isn't opened. The mutex is released. 6) Thread (II) issues a new broadcast. The mutex is acquired but the number of waiters are 0 hence the broadcast gate remains closed. 7) Thread (I) enters the wait again but is blocked by the broadcast gate. This fix resolves the above issue by always resetting broadcast gate when there are no more waiters in th queue. @ mysys/my_wincond.c * Always reset the broadcast gate if there are no more waiters left.
[6 Oct 2009 2:17]
Roel Van de Paar
Resolved stacktrace from bug_43758 (Ricardo Gomez) which should show the same issue. (Fedora Linux 2.6.27.5-117.fc10.x86_64)
Attachment: bug_43758_resolved_stacktrace_Ricardo_Gomez.txt (text/plain), 24.56 KiB.
[6 Oct 2009 2:21]
Roel Van de Paar
Customer verified that they no longer see the hang when FLUSH TABLES is not executed.
[6 Oct 2009 2:23]
Roel Van de Paar
Krisofer, Davi, please check newly uploaded backtrace which should show the same issue, but this time not on Windows but Fedora Linux...
[6 Oct 2009 7:34]
Kristofer Pettersson
Roel: This bug is very specific to the Windows implementation of pthreads, it has nothing to do with Linux. The uploaded stack trace also unfortunately gives us very little to go on and I think the situation should be investigated further. Are the physical disks working as expected? Is there really a hang in fsync()? Please open yet another bug for the new unknown issue.
[6 Oct 2009 7:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/85827 2813 Kristofer Pettersson 2009-10-06 Bug#47768 pthread_cond_timedwait() is broken on windows The pthread_cond_wait implementations for windows might dead lock in some rare circumstances. 1) One thread (I) enter a timed wait and at a point in time ends up after mutex unlock and before WaitForMultipleObjects(...) 2) Another thread (II) enters pthread_cond_broadcast. Grabs the mutex and discovers one waiter. It set the broadcast event and closes the broadcast gate then unlocks the mutex. 3) A third thread (III) issues a pthread_cond_signal. It grabs the mutex, discovers one waiter, sets the signal event then unlock the mutex. 4) The first threads (I) enters WaitForMultipleObjects and finds out that the signal object is in a signalled state and exits the wait. 5) Thread (I) grabs the mutex and checks result status. The number of waiters is decreased and becomes equal to 0. The event returned was a signal event so the broadcast gate isn't opened. The mutex is released. 6) Thread (II) issues a new broadcast. The mutex is acquired but the number of waiters are 0 hence the broadcast gate remains closed. 7) Thread (I) enters the wait again but is blocked by the broadcast gate. This fix resolves the above issue by always resetting broadcast gate when there are no more waiters in th queue. @ mysys/my_wincond.c * Always reset the broadcast gate if there are no more waiters left.
[6 Oct 2009 10:01]
Bugs System
Pushed into 5.1.40 (revid:joro@sun.com-20091006095946-9vv2qal7rlot32r4) (version source revid:joro@sun.com-20091006095946-9vv2qal7rlot32r4) (merge vers: 5.1.40) (pib:11)
[6 Oct 2009 14:12]
Ricardo Gomez
Hi, Roel, Kristofer. For begin, thanks for your colaboration. I want know what I may to do for colaborate in the fix the problem. I don't understand what to do or what mean the stacktrace who sent me Roel. I don't be if I have open a new bug or if in this or in 43758 bug may be fix my problem. Thanks for help me. Thank you very much.
[6 Oct 2009 23:57]
Roel Van de Paar
Hi Kristofer, > This bug is very specific to the Windows implementation of pthreads, it has nothing to do with Linux. Understood. Interestingly, I see references to aio in the Fedora stack trace - I was previously under the impression that aio was only Windows related, but I see that there's a linux implementation as well (http://lse.sourceforge.net/io/aio.html) Hi Ricardo, > I want know what I may to do for colaborate in the fix the problem. As per the notes from Kristofer, this looks like a completely separate situation. I have logged a new bug with some questions for you here: http://bugs.mysql.com/bug.php?id=47768 Could you please follow up on this new bug/those questions?
[6 Oct 2009 23:59]
Roel Van de Paar
Ricardo, correction, see bug #47876 instead.
[12 Oct 2009 15:55]
Paul DuBois
Noted in 5.1.40 changelog. The pthread_cond_wait() implementations for Windows could deadlock in some rare circumstances. Setting report to NDI pending push into 5.5.x.
[22 Oct 2009 6:34]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:07]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091014071749-j0wmq9echal73tpe) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 19:53]
Paul DuBois
Noted in 5.5.0, 6.0.14 changelogs.
[22 Oct 2009 23:06]
Roel Van de Paar
Summary Overview: This bug was fixed in: 5.1.40, 5.5.0, 6.0.14 Workarounds: none (except for upgrade)
[15 Nov 2009 17:55]
Taylan Karaoglu
This bug also occuring at our server mysql server version is 5.1.40 gpl community. MyISAM Tables, 1K query per second. http://bugs.mysql.com/bug.php?id=43758 same issues with this bug report, also referenced here.
[18 Dec 2009 10:31]
Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:47]
Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:02]
Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:16]
Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)