Bug #51093 | Crash (possibly stack overflow) in MDL_lock::find_deadlock | ||
---|---|---|---|
Submitted: | 11 Feb 2010 13:45 | Modified: | 7 Mar 2010 1:00 |
Reporter: | John Embretsen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S1 (Critical) |
Version: | mysql-next-4284 | OS: | Solaris (SPARC) |
Assigned to: | Dmitry Lenev | CPU Architecture: | Any |
Tags: | pushbuild, rqg_pb2, test failure |
[11 Feb 2010 13:45]
John Embretsen
[11 Feb 2010 13:59]
John Embretsen
Stacktraces from more threads (from dbx in Pushbuild, sol10 sparc64).
Attachment: bug51093_stacktraces.txt (text/plain), 48.02 KiB.
[15 Feb 2010 8:52]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100327 3096 Dmitry Lenev 2010-02-15 Fix for bug #51093 "Crash (possibly stack overflow) in MDL_lock::find_deadlock". On some platform deadlock detector in metadata locking subsystem under certain conditions might have exhausted stack space causing server crashes. Particularly this caused failures of rqg_mdl_stability test on Solaris in PushBuild. During search for deadlock MDL deadlock detector could sometimes encounter loop in the waiters graph in which MDL_context which has started search for a deadlock does not participate. In such case our algorithm will continue looping assuming that either this deadlock will be resolved by MDL_context which has created it (i.e. by one of loop participants) or maximum search depth will be reached. Since max search depth was set to 1000 in the latter case on platforms where each iteration of deadlock search algorithm needs more than DEFAULT_STACK_SIZE/1000 bytes of stack (around 192 bytes for 32-bit and around 256 bytes for 64-bit platforms) we might have exhausted stack space. This patch solves this problem by reducing maximum search depth for MDL deadlock detector to 100. This should be safe at the moment as it is unlikely that each iteration of the current deadlock detector algorithm will consume more than 512 bytes of stack (thus total amount of stack required can't be more than 512*100 bytes) and we require at least 80K of stack in order to open any table. Additional reasearch should be conducted in future in order to determine the more optimal value of maximum search depth. This patch does not include test case as existing rqg_mdl_stability test can serve as one.
[15 Feb 2010 12:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100368 3098 Dmitry Lenev 2010-02-15 Fix for bug #51093 "Crash (possibly stack overflow) in MDL_lock::find_deadlock". On some platforms deadlock detector in metadata locking subsystem under certain conditions might have exhausted stack space causing server crashes. Particularly this caused failures of rqg_mdl_stability test on Solaris in PushBuild. During search for deadlock MDL deadlock detector could sometimes encounter loop in the waiters graph in which MDL_context which has started search for a deadlock does not participate. In such case our algorithm will continue looping assuming that either this deadlock will be resolved by MDL_context which has created it (i.e. by one of loop participants) or maximum search depth will be reached. Since max search depth was set to 1000 in the latter case on platforms where each iteration of deadlock search algorithm needs more than DEFAULT_STACK_SIZE/1000 bytes of stack (around 192 bytes for 32-bit and around 256 bytes for 64-bit platforms) we might have exhausted stack space. This patch solves this problem by reducing maximum search depth for MDL deadlock detector to 32. This should be safe at the moment as it is unlikely that each iteration of the current deadlock detector algorithm will consume more than 1K of stack (thus total amount of stack required can't be more than 32K) and we require at least 80K of stack in order to open any table. Also this value should be (hopefully) big enough to not cause too much false deadlocks errors (there is an anecdotal evidence that real-life deadlocks are typically shorter than that). Additional reasearch should be conducted in future in order to determine the more optimal value of maximum search depth. This patch does not include test case as existing rqg_mdl_stability test can serve as one.
[15 Feb 2010 12:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/100373 3099 Dmitry Lenev 2010-02-15 Fix for bug #51093 "Crash (possibly stack overflow) in MDL_lock::find_deadlock". On some platforms deadlock detector in metadata locking subsystem under certain conditions might have exhausted stack space causing server crashes. Particularly this caused failures of rqg_mdl_stability test on Solaris in PushBuild. During search for deadlock MDL deadlock detector could sometimes encounter loop in the waiters graph in which MDL_context which has started search for a deadlock does not participate. In such case our algorithm will continue looping assuming that either this deadlock will be resolved by MDL_context which has created it (i.e. by one of loop participants) or maximum search depth will be reached. Since max search depth was set to 1000 in the latter case on platforms where each iteration of deadlock search algorithm needs more than DEFAULT_STACK_SIZE/1000 bytes of stack (around 192 bytes for 32-bit and around 256 bytes for 64-bit platforms) we might have exhausted stack space. This patch solves this problem by reducing maximum search depth for MDL deadlock detector to 32. This should be safe at the moment as it is unlikely that each iteration of the current deadlock detector algorithm will consume more than 1K of stack (thus total amount of stack required can't be more than 32K) and we require at least 80K of stack in order to open any table. Also this value should be (hopefully) big enough to not cause too much false deadlock errors (there is an anecdotal evidence that real-life deadlocks are typically shorter than that). Additional reasearch should be conducted in future in order to determine the more optimal value of maximum search depth. This patch does not include test case as existing rqg_mdl_stability test can serve as one.
[15 Feb 2010 14:10]
Dmitry Lenev
Fix for this bug was pushed into mysql-next-4284 tree. Since it was not repeatable outside of this non-public tree there is nothing to document. So I am simply closing this bug. Please feel free to reopen it if problem re-occurs!
[16 Feb 2010 9:26]
John Embretsen
Fix looks good (read: issue not seen in Pushbuild) so far. Thanks for fixing so quickly!
[16 Feb 2010 16:50]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:alik@sun.com-20100215140849-b9fal65nwvrzczh4) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:59]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:alik@sun.com-20100215140838-olj0kdt5rps9wgec) (pib:16)
[6 Mar 2010 11:08]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 1:00]
Paul DuBois
No changelog entry needed.