Bug #36526 | Falcon deadlock when running sysbench | ||
---|---|---|---|
Submitted: | 6 May 2008 10:57 | Modified: | 16 Oct 2008 21:12 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
Version: | 6.0.5 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_RECORD TREE |
[6 May 2008 10:57]
Philip Stoev
[6 May 2008 11:14]
Philip Stoev
It appears that there is a deadlock between RecordLeaf::retireRecords2 and RecordLeaf::fetch. I am attaching debug output on thread stalls and backtraces.
[6 May 2008 11:15]
Philip Stoev
Thread stalls for bug 36526
Attachment: bug36526_stalls.txt (text/plain), 11.66 KiB.
[6 May 2008 11:27]
Philip Stoev
thread apply all bt output for bug 36526
Attachment: bug36526_threads.zip (application/x-zip-compressed, text), 143.76 KiB.
[6 May 2008 20:43]
Kevin Lewis
This is not a traditional deadlock in which syncObjects are gained in opposite order. The Scavenger thread is holding up 642 other threads. It has this call stack. Thread 653 (process 9901): #2 SyncObject::wait (..., type=Exclusive, ...) at SyncObject.cpp:413 #3 SyncObject::lock (..., type=Exclusive, timeout=0) at SyncObject.cpp:265 #4 Sync::lock (..., type=Exclusive) at Sync.cpp:58 #5 RecordLeaf::retireRecords() at RecordLeaf.cpp:171 #6 RecordGroup::retireRecords() at RecordGroup.cpp:124 #7 RecordGroup::retireRecords() at RecordGroup.cpp:124 #8 Table::retireRecords() at Table.cpp:1805 #9 Database::retireRecords() at Database.cpp:1807 #10 Database::scavenge() at Database.cpp:1722 #11 Scavenger::scavenge() at Scavenger.cpp:58 And it holds these SyncObjects; #5 RecordLeaf::retireRecords() waiting, Exclusive, syncPrior #5 RecordLeaf::retireRecords() locked, Exclusive, RecordLeaf::syncObject There are 52 threads waiting for a shared lock on this in RecordLeaf::fetch() #8 Table::retireRecords() locked, Exclusive, Table::syncObject There are 42 threads waiting for an exclusive lock on this in Table::validateAndInsert #9 Database::retireRecords() locked, Exclusive, Database::syncScavenge There are 548 threads waiting for an exclusive lock on this with this call stack; #2 SyncObject::wait #3 SyncObject::lock #4 Sync::lock #5 Database::retireRecords #6 Database::forceRecordScavenge #7 Table::allocRecordVersion #8 Table::fetchForUpdate I cannot find the thread 662 total threads 643 waiting threads 19 non-waiting threads 2 IO threads 1 Ticker 1 Pagewriter 5 Gopher 1 Scheduler 4 io_handler_thread 4 start_thread 1 main The scavenger thread which is waiting on a syncPrior is holding up everything. I could not find a lock on a syncPrior in any of the waiting call stacks. There must be some kind if code path in which a syncPrior is left locked.
[13 Jun 2008 15:10]
Philip Stoev
See bug #37395 for a situation with a simpliar stalled threads output.
[16 Oct 2008 21:12]
Kevin Lewis
This bug was fixed in 6.0.6 by a re-organization of the syncPrior locks. Going back to the code for 6.0.5 I can see the deadlock; The scavenger thread has an exclusive lock on Table::syncObject and is waiting on an exclusive lock on Table::syncPrior. There are 42 threads stuck in Table::validateAndUpdate. 41 of these call it from Table::deleteRecord() where the 6.0.5 code got shared locks on Table::syncPrior before calling validateAndUpdate where they are waiting for an Exclusive lock on Table::syncObject. The code as of version 6.0.6 does not lock Table::syncPrior in Table::deleteRecord() which avoids this deadlock. In fact, it does not even lock syncPrior in validateAndUpdate. It has this comment; // Do not need syncPrior here since this is a new record. // No other thread can see this record's priorVersion pointer.