| Bug #37080 | Falcon deadlock on concurrent insert and truncate | ||
|---|---|---|---|
| Submitted: | 29 May 2008 23:59 | Modified: | 8 Jan 11:32 |
| Reporter: | Vladislav Vaintroub | ||
| Status: | Closed | ||
| Category: | Server: Falcon | Severity: | S3 (Non-critical) |
| Version: | mysql-6.0-falcon-team | OS: | Any |
| Assigned to: | Vladislav Vaintroub | Target Version: | 6.0.6 |
| Triage: | D2 (Serious) | ||
[30 May 2008 0:14]
Vladislav Vaintroub
mysql-test-run.pl friendly test case
Attachment: falcon_bug_37080.test (application/octet-stream, text), 1.35 KiB.
[30 May 2008 0:17]
Vladislav Vaintroub
To run the test case, places attached file into mysql-test/t directory and run perl mysql-test-run.pl falcon_bug_37080 No corresponding result file is provided . I never seen the test completing, it hangs within couple of second after the start
[4 Jun 2008 20:06]
Kevin Lewis
Vlad, Neither Chris, Ann or I have any conceptual problem with serializing the truncate operation. To the best of our memory, the reason it is allowed in Falcon now is that Chris spent a lot of time fixing a bug to make it happen. For a short while, StorageInrterface::store_lock() actually allowed the server to do a table lock on SQLCOM_TRUNCATE. But we commented that line again once we thought we had it working, evidenced by falcon_bug_22173a. However, looking back at the number of intermittent failures for falcon_bug_22173a, it is evident that concurrent truncate still has problems, which you have found yourself. Please note that if we uncomment the line // && (sql_command != SQLCOM_TRUNCATE) in StorageInrterface::store_lock then it will be very likely that falcon_bug_22173a will take too long and timeout in pushbuild. It did before. That said, the deadlock that you found was between a client thread doing Database::truncateTable and a gopher thread completing an update that had previously been committed. Serializing truncateTable may have an effect on the likelihood of if occurring, but does not exclude this possible conflict. We need to know these questions; 1) Why is SerialLog::syncSections used? Is it needed at all? 2) Why is the gopher thread still completing changes in a table that is currently being dropped with Dbb::deleteSection? Where is the protection there? 3) If we serialize truncates, (which seems like something that only changes the timing of the problems you identified), what changes do you propose, specifically? Can you provide falcon-private a proposed patch for review?
[6 Jun 2008 13:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47525 2692 Vladislav Vaintroub 2008-06-06 Bug#37080, Bug#35991 - Serialize truncate with any other table operation via MySQL server. - Remove Falcon own truncate serialization mechanism, that results into deadlocks
[11 Jun 2008 20:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47752 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[11 Jun 2008 21:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47756 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[22 Aug 2008 22:06]
Kevin Lewis
This fix is in version 6.0.6
[8 Jan 11:32]
MC Brown
A note has been added to the 6.0.6 changelog: When performing operations on a table in one client while a different client is performing a TRUNCATE operation on the same FALCON table a deadlock could be introduced.

Description: Got a deadlock while running test on with one client doing insert and other client doing truncate on the same table. Th threads in question ("truncate" thread and gopher thread) acquire the same locks ,log::syncSections and Table::syncObject, in different order as shown below and this is the reason for the deadlock. Annotated Callstacks Truncate thread syncObject::wait SyncObject::lock Sync::lock SRLDropTable::append <-- lock (Exclusive) on log::syncSections, SRLDropTable.cpp ,line 50 Dbb::deleteSection Table::expunge Table::truncate Database::truncateTable <-- lock (Exclusive) on Table::syncObject, Database.cpp, line 1480 StorageDatabase::truncateTable StorageTableShare::truncateTable StorageTable::truncateTable StorageInterface::delete_all_rows Gopher thread SyncObject::wait SyncObject::lock Table::treeFetch <-- lock (Shared) on Table::syncObject, Table.cpp, line 915 Table::validateUpdate Section::updateRecord Dbb::updateRecord SRLUpdateRecords::commit SerialLogTransaction::commit <-- lock (Shared) on log::syncSections, SRLUpdateRecords.cpp, line 346 SerialLogTransaction::doAction Gopher::gopherThread How to repeat: Will attach the testcase shortly Suggested fix: It seems to me, that the best option to tell the MySQL Server to to disallow all concurrent operation on the table if truncate is running, i.e uncomment the line // && (sql_command != SQLCOM_TRUNCATE) in StorageInrterface::store_lock This way we would not need the Exclusive lock on the table, while truncate is running (Database.cpp, line 1480). I also found that TRUNCATE synchronization within Falcon via Table::synObject is not 100% effective.In fact, test in the related Bug#35991 crashes because of the race condition (updating thread is trying to get section page that has just been freed by truncate).