Bug #41194 Falcon durability should occur earlier in the commit process.
Submitted: 3 Dec 2008 5:43 Modified: 15 May 2009 13:03
Reporter: Kevin Lewis Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.8 OS:Any
Assigned to: Kevin Lewis CPU Architecture:Any
Tags: F_TRANSACTION
Triage: Triaged: D1 (Critical)

[3 Dec 2008 5:43] Kevin Lewis
Description:
The point in time in which a transaction should be considered committed in the view of all other transactions should not occur before the point of durability.

Durability occurs when the commit record is written to the serial log.  This is the call to database->commit(this) in Transaction::commit().  

The problem is that this call is near the end of the current code for Transaction::commit().  It is currently done AFTER the state is changed to committed.

In the current code, transaction A can synchronously move itself from the active list to the committed list and set the state to Committed.  Then before transaction A can write to the serial log, transaction B can make a change based on transaction A's records that are now visible, commit those changes, and write the commit record to the serial log.  Then before transaction A can write to the serial log, the system could crash and the serial log would not be recoverable.

We need to move the call to database->comit(this) higher up in Transaction::commit().   In order for group commits to work, this call must be in a non-synchronized portion of the commit.  The only synchronized point of a commit is the previously mentioned transition between the active list and the committed list.

If database->commit(this) happens just before this, the following could happen.

Transaction A can write the commit record to the serial log. Transaction B can do the same immediately after, but transfer itself to the committed list, set state=Committed, and get an end event before Transaction A.  If the crash happens now, while B is showing as committed but A is not, I don't think there is any harm.  They will both show as committed in the recovery.  Eerything is available for these two transactions to fully commit during recovery. And transaction A is not doing anything else that may require a visibility check with transaction B.  Transaction A has started the commit and is ready to chang its state.

But let's think about a third transaction C that sees the change by B but not by A, even though both are in the serial log as committed. Maybe the third transaction updates B's records but not A's.  Then transaction C commits and gets its commit record into the serial log before the system crashes.  The recovery will see that all three transactions committed and will redo the changes by all three.  The changes that actually happened are all redone.  So data is consistent still.

How to repeat:
There is no deterministic way to cause this problem.  The scenario described above is very timing dependent and highly unlikely, but possible.  But if it can happen, given enough time and the right environment, it will happen.  The effect would be an unrecoverable database.

Suggested fix:
We need to move the call to database->comit(this) higher up to just before the transition between the two lists.
[3 Dec 2008 5:55] Kevin Lewis
Verified by code inspection and discussions with Jim Starkey.
[3 Dec 2008 5:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60455

2923 Kevin Lewis	2008-12-03
      Bug#41194 - The point in time in which a transaction
      is visible as committed by all other transactions 
      should not occur before the point of durability.
      Durability occurs when the commit record is written
      to the serial log.  This is the call to 
      database->commit(this) in Transaction::commit().
      It needs to happen before the status is changed.
[3 Dec 2008 15:32] Kevin Lewis
The previous patch does not work.  What happens is that the gopher thread immediately starts processing the serial log commit record and gets all the way to Transaction::writeComplete before the Transaction::commit even gets to the state = committed; line.  So then it asserts in writeComplete that (state == committed).  

The call to database->commit(this); does not have to occur before switching the transaction from activeTransaction list to comittedTransactions.  But it does need to be called before other waitning transactions are signaled that this transaction is committed.  

And the state change does not need to be protected by transactionManager->activeTransactions.syncObject.  But it does need to occur before the point of durability, for the gopher threads sake.

A new patch is on the way...
[3 Dec 2008 15:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60504

2923 Kevin Lewis	2008-12-03
      Bug#41194 - The point in time in which a transaction
      is visible as committed by all other transactions
      should not occur before the point of durability.
      Durability occurs when the commit record is written
      to the serial log.  This is the call to
      database->commit(this) in Transaction::commit().
      It needs to happen immediately after the status is
      changed, but before other waiting transactions are 
      signaled.
[3 Dec 2008 17:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60522

2923 Kevin Lewis	2008-12-03
      Bug#41194 - The point in time in which a transaction
      is visible as committed by all other transactions
      should not occur before the point of durability.
      Durability occurs when the commit record is written
      to the serial log.  This is the call to
      database->commit(this) in Transaction::commit().
      It needs to happen immediately befor the status is
      changed and other waiting transactions are
      signaled.  Then in order to prevent the gopher 
      thread from processing this serialLogTransaction
      before the commit is done, the gopher needs to wait 
      on that transaction
[3 Dec 2008 17:50] Kevin Lewis
The following is a summary of an email conversation between Ann and Kevin.

The second patch uses this order;
1) synchronously move trans from Active to committed lists
2) Transaction::state = committed
3) Durability - write committed record to serial log.
4) Transaction::syncIsActive.unlock()

But this solution leaves a gap between 2 and 4 where another transaction can take action on a record that is "committed" but not yet durable.  A transaction could start, read the committing transaction's work, and commit in that gap.  Seems unlikely, but really mystic things happen between instructions under load.

Another solution is to have the gopher check that the transaction has a state of committed before it starts to move changes out of the serial log.
The gopher thread can get a quick shared lock on Transaction::syncIsActive before processing it.  Since the gopher is in the background the performance cost is acceptable.  And the transaction in the serial log with writePending == true has a predictable path bewteen Transactin::state == active to committed.  So that wait is deterministic.

It is still a good idea to do #3-durability before #4-signal. Regardless of what state the transaction may claim or what list it's on, the actual commit happens when the serial log commit record hits oxide (or SSD).  

The third patch goes back to the order of the first patch, but also adds a waitForTransaction to the gopher thread so that it will not process that transaction until the commit is fully finished.  The order for the third patch is once again;

 1) Durability - write committed record to serial log.
 2) synchronously move trans from Active to committed lists
 3) Transaction::state = committed
 4) Transaction::syncIsActive.unlock()

Nothing is lost if a new transaction sees a transaction that is
in the process of committing as uncommitted - if it had started
a microsecond sooner, the transaction would have been active.
Not seeing concurrent results (except in special cases of unique
and foreign key constraints) is not a problem.  Seeing results
that are not actually durable is a major violation of transaction
semantics.
[4 Dec 2008 19:10] Kevin Lewis
There were several regressions with patch 3.  Investigating...
[16 Dec 2008 22:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61832

2938 Kevin Lewis	2008-12-16
      Bug#41194 - Move point of durability up higher in the commit
      so that by the time other waiting threads are signalled that
      this transaction is committed, it will already be durable.
[19 Dec 2008 6:39] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62043

2940 Kevin Lewis	2008-12-19
      Bug#41194 - The Transaction knows about the serial log.  
      Let it make calls to SerialLogRecord functions directly.
      Separate the serial log flush of commit and rollback records
      from when we allow the gophers to start processing them.
      Once those records are flushed, the recovery will be able to 
      process them, but we do not want the gophers to do that until 
      the commit or rollback is fully finished.  
      There is no need for the prepare flush to also start a gopher.
      Also, since the Transaction already knows about the SerialLog,
      there is no reason to pass those calls through Database or Dbb.
[19 Dec 2008 18:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62155

2949 Kevin Lewis	2008-12-19
      Bug#41194 - The Transaction knows about the serial log.  
      Let it make calls to SerialLogRecord functions directly.
      Separate the serial log flush of commit and rollback records
      from when we allow the gophers to start processing them.
      Once those records are flushed, the recovery will be able to 
      process them, but we do not want the gophers to do that until 
      the commit or rollback is fully finished.  
      There is no need for the prepare flush to also start a gopher.
[13 Feb 2009 7:25] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:klewis@mysql.com-20081219184532-2torpa3yel1d59in) (merge vers: 6.0.9-alpha) (pib:6)
[15 May 2009 13:03] MC Brown
An entry has been added to the 6.0.10 changelog: 

Transactions in Falcon tables could be recorded incorrectly, leading other waiting transactions to complete even though the original transaction information had not been successfully made durable.