Bug #37080 | Falcon deadlock on concurrent insert and truncate | ||
---|---|---|---|
Submitted: | 29 May 2008 21:59 | Modified: | 8 Jan 2009 10:32 |
Reporter: | Vladislav Vaintroub | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-6.0-falcon-team | OS: | Any |
Assigned to: | Vladislav Vaintroub | CPU Architecture: | Any |
[29 May 2008 21:59]
Vladislav Vaintroub
[29 May 2008 22:14]
Vladislav Vaintroub
mysql-test-run.pl friendly test case
Attachment: falcon_bug_37080.test (application/octet-stream, text), 1.35 KiB.
[29 May 2008 22:17]
Vladislav Vaintroub
To run the test case, places attached file into mysql-test/t directory and run perl mysql-test-run.pl falcon_bug_37080 No corresponding result file is provided . I never seen the test completing, it hangs within couple of second after the start
[4 Jun 2008 18:06]
Kevin Lewis
Vlad, Neither Chris, Ann or I have any conceptual problem with serializing the truncate operation. To the best of our memory, the reason it is allowed in Falcon now is that Chris spent a lot of time fixing a bug to make it happen. For a short while, StorageInrterface::store_lock() actually allowed the server to do a table lock on SQLCOM_TRUNCATE. But we commented that line again once we thought we had it working, evidenced by falcon_bug_22173a. However, looking back at the number of intermittent failures for falcon_bug_22173a, it is evident that concurrent truncate still has problems, which you have found yourself. Please note that if we uncomment the line // && (sql_command != SQLCOM_TRUNCATE) in StorageInrterface::store_lock then it will be very likely that falcon_bug_22173a will take too long and timeout in pushbuild. It did before. That said, the deadlock that you found was between a client thread doing Database::truncateTable and a gopher thread completing an update that had previously been committed. Serializing truncateTable may have an effect on the likelihood of if occurring, but does not exclude this possible conflict. We need to know these questions; 1) Why is SerialLog::syncSections used? Is it needed at all? 2) Why is the gopher thread still completing changes in a table that is currently being dropped with Dbb::deleteSection? Where is the protection there? 3) If we serialize truncates, (which seems like something that only changes the timing of the problems you identified), what changes do you propose, specifically? Can you provide falcon-private a proposed patch for review?
[6 Jun 2008 11:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47525 2692 Vladislav Vaintroub 2008-06-06 Bug#37080, Bug#35991 - Serialize truncate with any other table operation via MySQL server. - Remove Falcon own truncate serialization mechanism, that results into deadlocks
[11 Jun 2008 18:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47752 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[11 Jun 2008 19:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/47756 2699 Vladislav Vaintroub 2008-06-11 Bug#37080 - Falcon deadlock on concurrent insert and truncate Problem: Two threads, one processing TRUNCATE and gopher processing INSERT acquire the same locks, SerlialLog::syncSections and Table::syncObject in different order Solution: Rearrange locks, so that syncSections is always locked before Table::syncObject. For this, move lock to syncSection from SRLDropTable::append() up the stack, into to Database::dropTable() Database::truncateTable()
[22 Aug 2008 20:06]
Kevin Lewis
This fix is in version 6.0.6
[8 Jan 2009 10:32]
MC Brown
A note has been added to the 6.0.6 changelog: When performing operations on a table in one client while a different client is performing a TRUNCATE operation on the same FALCON table a deadlock could be introduced.
[16 Mar 2010 17:21]
Hemanth Murthy
Hi Vlad, I am running your test case (falcon_bug_37080) on my machine. I had $num to 100000 and after sometime, the test case failed with the following error: Installing Master Database ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- falcon_bug_37080 failed timeout ======================================================= I am using mysql-6.0.5-alpha src since I needed to reproduce this bug. Any idea what it could mean?
[16 Mar 2010 21:14]
Kevin Lewis
Hemanth, MTR has a timeout value that will interrupt the test of the test takes too long. I just ran the 37080 test on my laptop against falcon, and then changed the test to use innodb. Falcon took 1.282 secs to run 1000 loops and InnoDB took 28.937 seconds. So if you raise the loopcount to 100,000 you really might have a test that runs too long under MTR, even for Falcon.
[16 Mar 2010 21:24]
Hemanth Murthy
I was actually trying to reproduce the reported bug (37080). And according to the bug report, the test should hang within a few seconds. Any pointers on how to recreate this? Thanks, Hemanth
[16 Mar 2010 22:44]
Kevin Lewis
It was a problem with an older version of Falcon in which SyncObjects were locked in opposite order, sometimes causing a deadlock. It does not happen anymore.