| Bug #39321 | Falcon deadlock between Table::retireRecords and Database::retireRecords | ||
|---|---|---|---|
| Submitted: | 8 Sep 2008 15:58 | Modified: | 9 Jan 2009 14:13 |
| Reporter: | Philip Stoev | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S1 (Critical) |
| Version: | 6.0-falcon-team | OS: | Any |
| Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[8 Sep 2008 16:00]
Philip Stoev
Stacks for bug 39321
Attachment: bug39321.stacks.txt (text/plain), 50.50 KiB.
[8 Sep 2008 19:02]
Kevin Lewis
This deadlock can happen when a Truncate command runs out of memory and has to call Database::forceRecordScavenge(). Any other thread that calls it at the same time can get into a deadlock with it because it locks Table::syncObject before Database::syncScavenge wherease most other threads will get Database::syncScavenge before Table::syncObject. Thread 13 Database::truncateTable(4) (Table::syncObject) -> ... Table::allocRecord -> Database::forceRecordScavenge -> Database::retireRecords (Database::syncScavenge) Thread 9 ... Record::allocRecordData -> Database::forceRecordScavenge -> Database::retireRecords (Database::syncScavenge) Table::retireRecords (Table::syncObject) I think the solution is for the Database::truncateTable to also lock Database::syncScavenge before it gets started. It is already locking these; Database::truncateTable(1) Database::syncSysDDL Database::truncateTable(2) Database::syncTables Database::truncateTable(3) SerialLog::syncSections Database::truncateTable(4) Table::syncObject
[11 Sep 2008 4:00]
Kevin Lewis
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53743 2824 Kevin Lewis 2008-09-10 Bug#39321 Add an exclusive lock on Database::syncScavenge in Database::truncateTable before the lock of Table::syncObject just in case the truncateTable process has to call Database::forceRecordScavenge. syncScavenge must be locked before Table::syncObject because the scavenger does it that way. According to the Deadlock Detector, syncScavenge must also be locked before Database::syncTables.
[12 Sep 2008 16:57]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/53990 2819 Vladislav Vaintroub 2008-09-12 Bug#39321 - messages in recovery about exceptions from ReadFile. Ignore ERROR_HANDLE_EOF coming from ReadFile() It is end of file and read should just return 0 like it does in Posix case.
[30 Sep 2008 18:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/54804 2843 Kevin Lewis 2008-09-30 Bug#39321 Add an exclusive lock on Database::syncScavenge in Database::truncateTable before the lock of Table::syncObject just in case the truncateTable process has to call Database::forceRecordScavenge. syncScavenge must be locked before Table::syncObject because the scavenger does it that way. According to the Deadlock Predictor (SyncHandler.cpp), syncScavenge must also be locked before Database::syncTables.
[30 Sep 2008 18:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/54805 2843 Kevin Lewis 2008-09-30 Bug#39321 Add an exclusive lock on Database::syncScavenge in Database::truncateTable before the lock of Table::syncObject just in case the truncateTable process has to call Database::forceRecordScavenge. syncScavenge must be locked before Table::syncObject because the scavenger does it that way. According to the Deadlock Predictor (SyncHandler.cpp), syncScavenge must also be locked before Database::syncTables.
[9 Jan 2009 14:13]
MC Brown
A note has been added to the 6.0.8 changelog: When running TRUNCATE on a table where other threads are also trying to access the same Falcon table, a deadlock could occur between the two executing threads

Description: When executing the falcon_recovery.yy test, Falcon deadlocked as follows: Stalled threads Thread 0xb708cb58 (-1489790064) sleep=0, grant=0, locks=1, who 0, parent=(nil) Pending Table::findField state 0 (2) syncObject 0xb3afc728 Thread 0xb70bcc00 (-1476863088) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70e7b70 (-1477665904) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Table::retireRecords state 0 (2) syncObject 0xb3afc728 Thread 0xb70e85c8 (-1478268016) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70b9c00 (-1477063792) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70ef9f8 (-1477465200) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70e8058 (-1478468720) sleep=1, grant=0, locks=2, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70ef118 (-1478067312) sleep=1, grant=0, locks=1, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70dc760 (-1477264496) sleep=1, grant=0, locks=3, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Thread 0xb70dc840 (-1477866608) sleep=1, grant=0, locks=2, who 0, parent=(nil) Pending Database::retireRecords(1) state 0 (1) syncObject 0xb7282f2c Stalled synchronization objects: SyncObject b3afc728: state -1, readers 0, monitor 0, waiters 2 Exclusive thread b70bcc00 (-1476863088), type 1; Database::retireRecords(1) Waiting thread b70e7b70 (-1477665904), type 2; Table::retireRecords Waiting thread b708cb58 (-1489790064), type 2; Table::findField SyncObject b7282f2c: state -1, readers 0, monitor 0, waiters 8 Exclusive thread b70e7b70 (-1477665904), type 2; Table::retireRecords Waiting thread b70e85c8 (-1478268016), type 1; Database::retireRecords(1) Waiting thread b70b9c00 (-1477063792), type 1; Database::retireRecords(1) Waiting thread b70ef9f8 (-1477465200), type 1; Database::retireRecords(1) Waiting thread b70e8058 (-1478468720), type 1; Database::retireRecords(1) Waiting thread b70ef118 (-1478067312), type 1; Database::retireRecords(1) Waiting thread b70bcc00 (-1476863088), type 1; Database::retireRecords(1) Waiting thread b70dc760 (-1477264496), type 1; Database::retireRecords(1) Waiting thread b70dc840 (-1477866608), type 1; Database::retireRecords(1) Thread b70e7b70 waits on b70bcc00 but b70bcc00 waits on b70e7b70. How to repeat: This has only happened once after numerous test runs. Please debug this from the stalled threads output and the thread backtraces that I will upload shortly.