MySQL Bugs: #29332: falcon deadlocks when running falcon_bug

Bug #29332	falcon deadlocks when running falcon_bug_28026.test
Submitted:	25 Jun 2007 10:44	Modified:	19 Sep 2007 13:03
Reporter:	Sergey Vojtovich	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any

Description:
(subj)

How to repeat:
Run falcon_bug_28026.test.

Thank you for the bug report.

c:\dev\6.0>bin\mysqld-nt --standalone --console
070625  8:48:46  InnoDB: Started; log sequence number 0 46409
070625  8:48:46 [Note] Event Scheduler: Loaded 0 events
070625  8:48:47 [Note] bin\mysqld-nt: ready for connections.
Version: '6.0.1-alpha-nt'  socket: ''  port: 3306  Source distribution
Recovering database c:\dev\6.0\data\falcon_master.fts ...
first recovery block is 1
last recovery block is 156
recovery read block is 155
Recovery complete
Transactions: 10 committed, 0 rolled back, 0 active, 1 post-commit
Activity on c:\dev\6.0\data\falcon_master.fts: 1191 fetches, 60 reads, 32 writes
Activity on c:\dev\6.0\data\falcon_temporary.fts: 0 fetches, 1 reads, 1 writes
Activity on c:\dev\6.0\data\falcon_user.fts: 20 fetches, 2 reads, 11 writes
SerialLog: 0 reads, 10 writes, 3 transactions, 4 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
Activity on c:\dev\6.0\data\falcon_master.fts: 0 fetches, 0 reads, 2 writes
Activity on c:\dev\6.0\data\falcon_user.fts: 10 fetches, 0 reads, 0 writes
SerialLog: 0 reads, 1 writes, 1 transactions, 2 blocks, 1 windows
Stalled threads
  Thread 8ec00b4 (9192) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccc, next=cccccccc, parent=0
Stalled synchronization objects:
------------------------------------
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
SerialLog: 0 reads, 1 writes, 1 transactions, 3 blocks, 1 windows
Exception: deadlock on table TEST.T1#P#P8
Exception: deadlock on table TEST.T1#P#P8
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
SerialLog: 0 reads, 1 writes, 0 transactions, 0 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit

At the end of the test, T1 still hangs. I believe it is correct since T2 has not released the lock on s1 = 9 yet. Suggest to complete the test case.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

This testcase hangs Falcon now because it expects Falcon to do what Innodb does.  Innodb automatically does a rollback of any transaction that gets a deadlock.  See storage\innobase\handler\ha_innodb.cc; line 581.
So the testcase waits when it does not need to and does not commit when it should.  

However, Falcon is not actually returning a deadlock error all the way through the handler interface.  The error gets interpreted as 'no record found' on the select for update that should have caused a deadlock.  I can play with returning  this deadlock error all the way to the server, but it might affect some other testcases.

The other question for Ann and Jim is whether we want Falcon to automatically rollback any transaction that gets a deadlock like Innodb does in the code mentioned above.

The error that is returned is;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

It has been decided by the Falcon team that deadlocks will not cause the transaction to automatically abort.  This is rude and presumptive and against the SQL standard.  It also prevents wise database designers from designing transactions that can move forward when a deadlock is detected.

I am currently testing a set of changes to Falcon that allow a deadlock detected during a call to fetchForUpdate to be returned to the calling engine.  This also includes changes to falcon_bug_28026.test so that a deadlock code is expected immediately.

In addition, I have added a new testcase, falcon_bug_29332, which tests this deadlock code using not only a partitioned table, but also an indexed table.  This test expects the transaction to remain active after receiving the deadlock error.

When these test cases use a partition table, they require a change to ha_partition.cc.  In ha_partition::rnd_next(), there are two 'int result = ...' declarations. the second one receives the result of the call into the storage engine and if there is an error, it is lost when that 'result' goes out of scope.  The required fix is to change;
    int result= file->rnd_next(buf);
to;
    result= file->rnd_next(buf);

Ingo and/or Mikael, please verify this change and I will push it to falcon-5.1.

Pushed to Falcon repository.  Still need code review for the change to ha_partition.

The following are email conversations about deadlocks without rollbacks;

Ann Harrison wrote;
I have always considered that transaction control is the responsibility of the client.  A deadlock, like a validation error, causes the current operation to fail and its changes to be undone but does not cause the whole transaction to rollback.

James Day wrote;
Agree with Ann here.  For background on why InnoDB acts as it does, see the private comments in:  http://bugs.mysql.com/bug.php?id=12308

Mark Matthews wrote;
Please make sure that you use a SQLState and error code that represents _deadlock_ with no rollback, versus one w/ rollback, because some people "care". 
For example, the JDBC driver keys off of SQLStates that start with "40" to tell applications that their _transaction_ was rolled back by the server. (InnoDB currently returns an error code of ERR_LOCK_DEADLOCK, which resolves to "40001" as a SQLState).
If we're now going to have engines that _don't_ do rollback on deadlock automatically, we _must_ use a different error code and sql state so that applications and drivers can tell what actually happened without using magic.

Mark Matthews added;
I just want to make sure that we don't re-use the SQLState of 40001 (which is currently mapped to ER_LOCK_DEADLOCK in sql_state.h) since that has a meaning, according to the SQL Spec.
As far as "codes", I leave that to the server folks. I suppose you can't return ER_LOCK_DEADLOCK, because that is hard-mapped to the SQLState of 40001.
We're kind of in a bind, because the "class" of "40" is "transaction rollback", so technically we can't re-use that for Falcon's behavior, _but_ deadlock is in there, because the subclass "001" is for "serialization failure".
I know that Oracle uses "61000" for deadlock (because it doesn't roll back the transaction either, if I remember correctly), and "6...." is in the vendor/implementation-specific range, so maybe we punt and do the same (or make up our own vendor-specific code?)
Maybe we punt and ask PeterG to interpret the SQL Spec for us in this case (serialization failure with no transaction rollback)?

Peter Gulutzan responded;
The SQL standard doesn't have an error for deadlock without rollback. You're right, class '40' is inappropriate. Just glancing at Oracle documentation, I guess people would expect the same for Oracle's '61000'.
DB2 Version 9.1 for z/OS has sqlstate = '57033' "Deadlock or timeout occurred without automatic rollback".  http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.m...
This error can occur when an implicit rollback fails, that is, it is not an exact match of MySQL's situation, but class '57' "Resource Not Available or Operator Intervention" looks okay as a precedent.

I agree regarding 'int result = ...'. I do even propose to get rid of the initialization a top of the function. 'result' is not used until it is unconditionallly set in the loop. And there is no chance not to enter the loop.

Where is the patch for this bug? Don't know what to review. Is the state "Patch pending" correct?

falcon_bug_28026.test passes now:

=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.falcon_bug_28026          [ pass ]           1245
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.