| Bug #36526 | Falcon deadlock when running sysbench | ||
|---|---|---|---|
| Submitted: | 6 May 2008 10:57 | Modified: | 16 Oct 2008 21:12 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
| Version: | 6.0.5 | OS: | Any |
| Assigned to: | Kevin Lewis | CPU Architecture: | Any |
| Tags: | F_RECORD TREE | ||
[6 May 2008 10:57]
Philip Stoev
[6 May 2008 11:14]
Philip Stoev
It appears that there is a deadlock between RecordLeaf::retireRecords2 and RecordLeaf::fetch. I am attaching debug output on thread stalls and backtraces.
[6 May 2008 11:15]
Philip Stoev
Thread stalls for bug 36526
Attachment: bug36526_stalls.txt (text/plain), 11.66 KiB.
[6 May 2008 11:27]
Philip Stoev
thread apply all bt output for bug 36526
Attachment: bug36526_threads.zip (application/x-zip-compressed, text), 143.76 KiB.
[6 May 2008 20:43]
Kevin Lewis
This is not a traditional deadlock in which syncObjects are gained in opposite order.
The Scavenger thread is holding up 642 other threads. It has this call stack.
Thread 653 (process 9901):
#2 SyncObject::wait (..., type=Exclusive, ...) at SyncObject.cpp:413
#3 SyncObject::lock (..., type=Exclusive, timeout=0) at SyncObject.cpp:265
#4 Sync::lock (..., type=Exclusive) at Sync.cpp:58
#5 RecordLeaf::retireRecords() at RecordLeaf.cpp:171
#6 RecordGroup::retireRecords() at RecordGroup.cpp:124
#7 RecordGroup::retireRecords() at RecordGroup.cpp:124
#8 Table::retireRecords() at Table.cpp:1805
#9 Database::retireRecords() at Database.cpp:1807
#10 Database::scavenge() at Database.cpp:1722
#11 Scavenger::scavenge() at Scavenger.cpp:58
And it holds these SyncObjects;
#5 RecordLeaf::retireRecords() waiting, Exclusive, syncPrior
#5 RecordLeaf::retireRecords() locked, Exclusive, RecordLeaf::syncObject
There are 52 threads waiting for a shared lock on this
in RecordLeaf::fetch()
#8 Table::retireRecords() locked, Exclusive, Table::syncObject
There are 42 threads waiting for an exclusive lock on this
in Table::validateAndInsert
#9 Database::retireRecords() locked, Exclusive, Database::syncScavenge
There are 548 threads waiting for an exclusive lock on this
with this call stack;
#2 SyncObject::wait
#3 SyncObject::lock
#4 Sync::lock
#5 Database::retireRecords
#6 Database::forceRecordScavenge
#7 Table::allocRecordVersion
#8 Table::fetchForUpdate
I cannot find the thread
662 total threads
643 waiting threads
19 non-waiting threads
2 IO threads
1 Ticker
1 Pagewriter
5 Gopher
1 Scheduler
4 io_handler_thread
4 start_thread
1 main
The scavenger thread which is waiting on a syncPrior is holding up everything. I could not find a lock on a syncPrior in any of the waiting call stacks. There must be some kind if code path in which a syncPrior is left locked.
[13 Jun 2008 15:10]
Philip Stoev
See bug #37395 for a situation with a simpliar stalled threads output.
[16 Oct 2008 21:12]
Kevin Lewis
This bug was fixed in 6.0.6 by a re-organization of the syncPrior locks. Going back to the code for 6.0.5 I can see the deadlock; The scavenger thread has an exclusive lock on Table::syncObject and is waiting on an exclusive lock on Table::syncPrior. There are 42 threads stuck in Table::validateAndUpdate. 41 of these call it from Table::deleteRecord() where the 6.0.5 code got shared locks on Table::syncPrior before calling validateAndUpdate where they are waiting for an Exclusive lock on Table::syncObject. The code as of version 6.0.6 does not lock Table::syncPrior in Table::deleteRecord() which avoids this deadlock. In fact, it does not even lock syncPrior in validateAndUpdate. It has this comment; // Do not need syncPrior here since this is a new record. // No other thread can see this record's priorVersion pointer.
