Bug #38661 | all threads hang in "opening tables" or "waiting for table" and cpu is at 100% | ||
---|---|---|---|
Submitted: | 8 Aug 2008 8:59 | Modified: | 7 Mar 2010 18:21 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Locking | Severity: | S2 (Serious) |
Version: | 6.0.7-debug,5.4 | OS: | Linux |
Assigned to: | Magne Mæhre | CPU Architecture: | Any |
[8 Aug 2008 8:59]
Shane Bester
[8 Aug 2008 9:12]
MySQL Verification Team
a bunch of info from the running binary/gdb
Attachment: bug38661_thread_info.txt (text/plain), 60.28 KiB.
[8 Aug 2008 10:10]
MySQL Verification Team
testcase. i could only repeat on 6.0.7 on linux server build with --with-libevent.
Attachment: bug38661.c (text/plain), 7.86 KiB.
[8 Aug 2008 10:14]
MySQL Verification Team
setting as verified. let me know if it's not repeatable.
[8 Aug 2008 11:42]
MySQL Verification Team
i just rebuilt the exact same server without specific --with-libevent and the problem still occurred. so, it's not related to that. not sure why my 6.0.5 and windows 6.0.7 don't see this problem. maybe linux/build specific?
[19 Nov 2008 13:14]
Magne Mæhre
Bug is repeatable on Solaris10/x86, _without_ libevent
[24 Nov 2008 11:35]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/59668 2752 Magne Mahre 2008-11-24 Bug #38661 all threads hang in "opening tables" or "waiting for table" and cpu is at 100% A race between open_tables and a "flush table" operation resulted in neither being able to complete. open_tables was not able to open the table and initiated a recover. The conditions for completing the recovery were too strict and couldn't be achieved while the flush was running. The solution was to loosen the requirement that said that a share couldn't exist without a table, since this is actually a valid condition in certain cases.
[28 Nov 2008 21:01]
MySQL Verification Team
I found a very similar bug for 5.x bug #41114 not sure if that is a duplicate..
[30 Nov 2008 8:38]
Dmitry Lenev
Hi Shane! I doubt that. The problem described in this bug report is specific for 6.* versions (at least as we understand it now). So IMO it is better to keep those two bugs separate.
[5 Dec 2008 14:14]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60743 2766 Magne Mahre 2008-12-05 Bug #38661 'all threads hang in "opening tables" or "waiting for table" and cpu is at 100%' Concurrent execution of FLUSH TABLES statement and at least two statements using the same table might have led to live-lock which caused all three connections to stall and hog 100% of CPU. tdc_wait_for_old_versions() wrongly assumed that there cannot be a share with an old version and no used TABLE instances and thus was failing to perform wait in situation when such old share was cached in MDL subsystem thanks to a still active metadata lock on the table. So it might have happened that two or more connections simultaneously executing statements which involve table being flushed managed to prevent each other from waiting in this function by keeping shared metadata lock on the table constantly active (i.e. one of the statements managed to take/hold this lock while other statements were calling tdc_wait_for_old_versions()). Thus they were forcing each other to loop infinitely in open_tables() - close_thread_tables_for_reopen() - tdc_wait_for_old_versions() cycle causing CPU hogging. This patch fixes this problem by removing this false assumption from tdc_wait_for_old_versions(). Note that the problem is specific only for server versions >= 6.0. No test case is submitted for this test, as the test infrastructure hasn't got the necessary primitives to test the behaviour. The manifestation is that throughput will decrease to a low level (possibly 0) after some time, and stay at that level. Several transactions will not complete. Manual testing can be done by running the code submitted by Shane Bester attached to the bug report. If the bug persists, the transaction thruput will almost immediately drop to near zero (shown as the transaction count output from the test program staying on a close to constant value, instead of increasing rapidly)
[8 Dec 2008 21:38]
Bugs System
Pushed into 6.0.9-alpha (revid:magne.mahre@sun.com-20081205141333-p37s1bj9xubkqbgd) (version source revid:magne.mahre@sun.com-20081205141333-p37s1bj9xubkqbgd) (pib:5)
[9 Jul 2009 8:51]
Konstantin Osipov
Pushed into 5.4.4
[17 Jul 2009 3:26]
Paul DuBois
Noted in 5.4.4 changelog. Concurrent connections executing FLUSH TABLES and at least two statements using the same table could cause all three connections to stall with 100% CPU utilization.
[12 Aug 2009 22:52]
Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 2009 2:08]
Paul DuBois
Ignore previous comment about 5.4.2.
[11 Dec 2009 11:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/93659 3031 Konstantin Osipov 2009-12-11 Backport of: ----------------------------------------------------------- 2630.28.28 Magne Mahre 2008-12-05 Bug #38661 'all threads hang in "opening tables" or "waiting for table" and cpu is at 100%' Concurrent execution of FLUSH TABLES statement and at least two statements using the same table might have led to live-lock which caused all three connections to stall and hog 100% of CPU. tdc_wait_for_old_versions() wrongly assumed that there cannot be a share with an old version and no used TABLE instances and thus was failing to perform wait in situation when such old share was cached in MDL subsystem thanks to a still active metadata lock on the table. So it might have happened that two or more connections simultaneously executing statements which involve table being flushed managed to prevent each other from waiting in this function by keeping shared metadata lock on the table constantly active (i.e. one of the statements managed to take/hold this lock while other statements were calling tdc_wait_for_old_versions()). Thus they were forcing each other to loop infinitely in open_tables() - close_thread_tables_for_reopen() - tdc_wait_for_old_versions() cycle causing CPU hogging. This patch fixes this problem by removing this false assumption from tdc_wait_for_old_versions(). Note that the problem is specific only for server versions >= 6.0. No test case is submitted for this test, as the test infrastructure hasn't got the necessary primitives to test the behaviour. The manifestation is that throughput will decrease to a low level (possibly 0) after some time, and stay at that level. Several transactions will not complete. Manual testing can be done by running the code submitted by Shane Bester attached to the bug report. If the bug persists, the transaction thruput will almost immediately drop to near zero (shown as the transaction count output from the test program staying on a close to constant value, instead of increasing rapidly).
[16 Feb 2010 16:49]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100216101445-2ofzkh48aq2e0e8o) (version source revid:kostja@sun.com-20091211154405-c9yhiewr9o5d20rq) (merge vers: 6.0.14-alpha) (pib:16)
[16 Feb 2010 16:59]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100216101208-33qkfwdr0tep3pf2) (version source revid:kostja@sun.com-20091211111859-lse5qbt8k1ar9q2p) (pib:16)
[16 Feb 2010 18:29]
Dmitry Lenev
Closing this bug as it is not repeatable in publicly available trees with versions < 6.0).
[6 Mar 2010 10:59]
Bugs System
Pushed into 5.5.3-m3 (revid:alik@sun.com-20100306103849-hha31z2enhh7jwt3) (version source revid:vvaintroub@mysql.com-20100216221947-luyhph0txl2c5tc8) (merge vers: 5.5.99-m3) (pib:16)
[7 Mar 2010 18:21]
Paul DuBois
No changelog entry needed.
[9 Mar 2010 8:54]
Klaus Redegeld
I am having the same problem on a 5.1.31-1ubuntu2, and it happens just from time to time.