Bug #37080 Falcon deadlock on concurrent insert and truncate
Submitted: 29 May 2008 21:59 Modified: 8 Jan 2009 10:32
Reporter: Vladislav Vaintroub Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:mysql-6.0-falcon-team OS:Any
Assigned to: Vladislav Vaintroub CPU Architecture:Any

[29 May 2008 21:59] Vladislav Vaintroub
Description:
Got a deadlock while running test on with one client doing insert and other client doing truncate on the same table.

Th threads in question ("truncate" thread and gopher thread) acquire the same locks ,log::syncSections and Table::syncObject, in different order as shown below and this is the reason for the deadlock.

Annotated Callstacks

Truncate thread

syncObject::wait
SyncObject::lock
Sync::lock
SRLDropTable::append <-- lock (Exclusive) on log::syncSections, SRLDropTable.cpp ,line 50
Dbb::deleteSection
Table::expunge
Table::truncate
Database::truncateTable	<-- lock (Exclusive) on Table::syncObject, Database.cpp, line 1480
StorageDatabase::truncateTable
StorageTableShare::truncateTable
StorageTable::truncateTable
StorageInterface::delete_all_rows

Gopher thread

SyncObject::wait
SyncObject::lock
Table::treeFetch  <-- lock (Shared) on Table::syncObject, Table.cpp, line 915
Table::validateUpdate
Section::updateRecord
Dbb::updateRecord
SRLUpdateRecords::commit
SerialLogTransaction::commit <-- lock (Shared) on log::syncSections, SRLUpdateRecords.cpp, line 346
SerialLogTransaction::doAction
Gopher::gopherThread

How to repeat:
Will attach the testcase shortly

Suggested fix:
It seems to me, that the best option to tell the MySQL Server to  to disallow all concurrent operation on the table if truncate is running, i.e uncomment the line 
  //  &&  (sql_command != SQLCOM_TRUNCATE)
in StorageInrterface::store_lock

This way we would not need the Exclusive lock on the table, while truncate is running (Database.cpp, line 1480). 

I also found that TRUNCATE synchronization within Falcon via Table::synObject is not 100% effective.In fact, test in the related Bug#35991 crashes because of the race condition (updating thread is trying to get section page that has  just been freed by truncate).
[29 May 2008 22:14] Vladislav Vaintroub
mysql-test-run.pl friendly test case

Attachment: falcon_bug_37080.test (application/octet-stream, text), 1.35 KiB.

[29 May 2008 22:17] Vladislav Vaintroub
To run the test case, places attached file into mysql-test/t directory and run 
perl mysql-test-run.pl falcon_bug_37080

No corresponding result file is provided . I never seen the test completing, it hangs within couple of second after the start
[4 Jun 2008 18:06] Kevin Lewis
Vlad,  Neither Chris, Ann or I have any conceptual problem with serializing the truncate operation.  To the best of our memory, the reason it is allowed in Falcon now is that Chris spent a lot of time fixing a bug to make it happen.  For a short while, StorageInrterface::store_lock() actually allowed the server to do a table lock on SQLCOM_TRUNCATE.  But we commented that line again once we thought we had it working, evidenced by falcon_bug_22173a.  However, looking back at the number of intermittent failures for falcon_bug_22173a, it is evident that concurrent truncate still has problems, which you have found yourself.

Please note that if we uncomment the line 
  //  &&  (sql_command != SQLCOM_TRUNCATE)
in StorageInrterface::store_lock
then it will be very likely that falcon_bug_22173a will take too long and timeout in pushbuild.  It did before.

That said, the deadlock that you found was between a client thread doing Database::truncateTable and a gopher thread completing an update that had previously been committed.  Serializing truncateTable may have an effect on the likelihood of if occurring, but does not exclude this possible conflict.

We need to know these questions;
1) Why is SerialLog::syncSections used?  Is it needed at all?
2) Why is the gopher thread still completing changes in a table that is currently being dropped with Dbb::deleteSection?  Where is the protection there?
3) If we serialize truncates, (which seems like something that only changes the timing of the problems you identified), what changes do you propose, specifically?  Can you provide falcon-private a proposed patch for review?
[6 Jun 2008 11:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47525

2692 Vladislav Vaintroub	2008-06-06
      Bug#37080, Bug#35991
      - Serialize truncate with any other table operation via MySQL server.
      - Remove Falcon own truncate serialization mechanism, that results
      into deadlocks
[11 Jun 2008 18:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47752

2699 Vladislav Vaintroub	2008-06-11
      Bug#37080 - Falcon deadlock on concurrent insert and truncate
      
      Problem: 
      Two threads, one processing TRUNCATE and gopher processing INSERT acquire 
      the same locks, SerlialLog::syncSections and Table::syncObject in different 
      order 
      
      Solution:
      Rearrange locks, so that syncSections is always locked before 
      Table::syncObject. For this, move lock to syncSection from 
      SRLDropTable::append() up the stack, into to Database::dropTable() 
      Database::truncateTable()
[11 Jun 2008 19:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/47756

2699 Vladislav Vaintroub	2008-06-11
      Bug#37080 - Falcon deadlock on concurrent insert and truncate
      
      Problem: 
      Two threads, one processing TRUNCATE and gopher processing INSERT acquire 
      the same locks, SerlialLog::syncSections and Table::syncObject in different 
      order 
      
      Solution:
      Rearrange locks, so that syncSections is always locked before 
      Table::syncObject. For this, move lock to syncSection from 
      SRLDropTable::append() up the stack, into to Database::dropTable() 
      Database::truncateTable()
[22 Aug 2008 20:06] Kevin Lewis
This fix is in version 6.0.6
[8 Jan 2009 10:32] MC Brown
A note has been added to the 6.0.6 changelog: 

When performing operations on a table in one client while a different client is performing a TRUNCATE operation on the same FALCON table a deadlock could be introduced.
[16 Mar 2010 17:21] Hemanth Murthy
Hi Vlad,

I am running your test case (falcon_bug_37080) on my machine. I had $num to 100000 and after sometime, the test case failed with the following error:

Installing Master Database
=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

falcon_bug_37080               failed         timeout
=======================================================

I am using mysql-6.0.5-alpha src since I needed to reproduce this bug. Any idea what it could mean?
[16 Mar 2010 21:14] Kevin Lewis
Hemanth, MTR has a timeout value that will interrupt the test of the test takes too long.  I just ran the 37080 test on my laptop against falcon, and then changed the test to use innodb. Falcon took 1.282 secs to run 1000 loops and InnoDB took 28.937 seconds.  So if you raise the loopcount to 100,000 you really might have a test that runs too long under MTR, even for Falcon.
[16 Mar 2010 21:24] Hemanth Murthy
I was actually trying to reproduce the reported bug (37080). And according to the bug report, the test should hang within a few seconds. Any pointers on how to recreate this?

Thanks,
Hemanth
[16 Mar 2010 22:44] Kevin Lewis
It was a problem with an older version of Falcon in which SyncObjects were locked in opposite order, sometimes causing a deadlock.  It does not happen anymore.