| Bug #37080 | Falcon deadlock on concurrent insert and truncate | ||
|---|---|---|---|
| Submitted: | 29 May 2008 21:59 | Modified: | 8 Jan 2009 10:32 |
| Reporter: | Vladislav Vaintroub | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
| Version: | mysql-6.0-falcon-team | OS: | Any |
| Assigned to: | Vladislav Vaintroub | CPU Architecture: | Any |
[29 May 2008 22:14]
Vladislav Vaintroub
mysql-test-run.pl friendly test case
Attachment: falcon_bug_37080.test (application/octet-stream, text), 1.35 KiB.
[29 May 2008 22:17]
Vladislav Vaintroub
To run the test case, places attached file into mysql-test/t directory and run perl mysql-test-run.pl falcon_bug_37080 No corresponding result file is provided . I never seen the test completing, it hangs within couple of second after the start
[4 Jun 2008 18:06]
Kevin Lewis
Vlad, Neither Chris, Ann or I have any conceptual problem with serializing the truncate operation. To the best of our memory, the reason it is allowed in Falcon now is that Chris spent a lot of time fixing a bug to make it happen. For a short while, StorageInrterface::store_lock() actually allowed the server to do a table lock on SQLCOM_TRUNCATE. But we commented that line again once we thought we had it working, evidenced by falcon_bug_22173a. However, looking back at the number of intermittent failures for falcon_bug_22173a, it is evident that concurrent truncate still has problems, which you have found yourself. Please note that if we uncomment the line // && (sql_command != SQLCOM_TRUNCATE) in StorageInrterface::store_lock then it will be very likely that falcon_bug_22173a will take too long and timeout in pushbuild. It did before. That said, the deadlock that you found was between a client thread doing Database::truncateTable and a gopher thread completing an update that had previously been committed. Serializing truncateTable may have an effect on the likelihood of if occurring, but does not exclude this possible conflict. We need to know these questions; 1) Why is SerialLog::syncSections used? Is it needed at all? 2) Why is the gopher thread still completing changes in a table that is currently being dropped with Dbb::deleteSection? Where is the protection there? 3) If we serialize truncates, (which seems like something that only changes the timing of the problems you identified), what changes do you propose, specifically? Can you provide falcon-private a proposed patch for review?
[6 Jun 2008 11:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47525 2692 Vladislav Vaintroub 2008-06-06 Bug#37080, Bug#35991 - Serialize truncate with any other table operation via MySQL server. - Remove Falcon own truncate serialization mechanism, that results into deadlocks
[11 Jun 2008 18:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47752 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[11 Jun 2008 19:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47756 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[22 Aug 2008 20:06]
Kevin Lewis
This fix is in version 6.0.6
[8 Jan 2009 10:32]
MC Brown
A note has been added to the 6.0.6 changelog: When performing operations on a table in one client while a different client is performing a TRUNCATE operation on the same FALCON table a deadlock could be introduced.
[16 Mar 2010 17:21]
Hemanth Murthy
Hi Vlad, I am running your test case (falcon_bug_37080) on my machine. I had $num to 100000 and after sometime, the test case failed with the following error: Installing Master Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- falcon_bug_37080 failed timeout ======================================================= I am using mysql-6.0.5-alpha src since I needed to reproduce this bug. Any idea what it could mean?
[16 Mar 2010 21:14]
Kevin Lewis
Hemanth, MTR has a timeout value that will interrupt the test of the test takes too long. I just ran the 37080 test on my laptop against falcon, and then changed the test to use innodb. Falcon took 1.282 secs to run 1000 loops and InnoDB took 28.937 seconds. So if you raise the loopcount to 100,000 you really might have a test that runs too long under MTR, even for Falcon.
[16 Mar 2010 21:24]
Hemanth Murthy
I was actually trying to reproduce the reported bug (37080). And according to the bug report, the test should hang within a few seconds. Any pointers on how to recreate this? Thanks, Hemanth
[16 Mar 2010 22:44]
Kevin Lewis
It was a problem with an older version of Falcon in which SyncObjects were locked in opposite order, sometimes causing a deadlock. It does not happen anymore.

Description: Got a deadlock while running test on with one client doing insert and other client doing truncate on the same table. Th threads in question ("truncate" thread and gopher thread) acquire the same locks ,log::syncSections and Table::syncObject, in different order as shown below and this is the reason for the deadlock. Annotated Callstacks Truncate thread syncObject::wait SyncObject::lock Sync::lock SRLDropTable::append <-- lock (Exclusive) on log::syncSections, SRLDropTable.cpp ,line 50 Dbb::deleteSection Table::expunge Table::truncate Database::truncateTable <-- lock (Exclusive) on Table::syncObject, Database.cpp, line 1480 StorageDatabase::truncateTable StorageTableShare::truncateTable StorageTable::truncateTable StorageInterface::delete_all_rows Gopher thread SyncObject::wait SyncObject::lock Table::treeFetch <-- lock (Shared) on Table::syncObject, Table.cpp, line 915 Table::validateUpdate Section::updateRecord Dbb::updateRecord SRLUpdateRecords::commit SerialLogTransaction::commit <-- lock (Shared) on log::syncSections, SRLUpdateRecords.cpp, line 346 SerialLogTransaction::doAction Gopher::gopherThread How to repeat: Will attach the testcase shortly Suggested fix: It seems to me, that the best option to tell the MySQL Server to to disallow all concurrent operation on the table if truncate is running, i.e uncomment the line // && (sql_command != SQLCOM_TRUNCATE) in StorageInrterface::store_lock This way we would not need the Exclusive lock on the table, while truncate is running (Database.cpp, line 1480). I also found that TRUNCATE synchronization within Falcon via Table::synObject is not 100% effective.In fact, test in the related Bug#35991 crashes because of the race condition (updating thread is trying to get section page that has just been freed by truncate).