Bug #29332 falcon deadlocks when running falcon_bug_28026.test
Submitted: 25 Jun 2007 12:44 Modified: 19 Sep 2007 15:03
Reporter: Sergey Vojtovich
Status: Closed
Category:Server: Falcon Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Kevin Lewis Target Version:

[25 Jun 2007 12:44] Sergey Vojtovich
Description:
(subj)

How to repeat:
Run falcon_bug_28026.test.
[25 Jun 2007 13:56] Miguel Solorzano
Thank you for the bug report.

c:\dev\6.0>bin\mysqld-nt --standalone --console
070625  8:48:46  InnoDB: Started; log sequence number 0 46409
070625  8:48:46 [Note] Event Scheduler: Loaded 0 events
070625  8:48:47 [Note] bin\mysqld-nt: ready for connections.
Version: '6.0.1-alpha-nt'  socket: ''  port: 3306  Source distribution
Recovering database c:\dev\6.0\data\falcon_master.fts ...
first recovery block is 1
last recovery block is 156
recovery read block is 155
Recovery complete
Transactions: 10 committed, 0 rolled back, 0 active, 1 post-commit
Activity on c:\dev\6.0\data\falcon_master.fts: 1191 fetches, 60 reads, 32 writes
Activity on c:\dev\6.0\data\falcon_temporary.fts: 0 fetches, 1 reads, 1 writes
Activity on c:\dev\6.0\data\falcon_user.fts: 20 fetches, 2 reads, 11 writes
SerialLog: 0 reads, 10 writes, 3 transactions, 4 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
Activity on c:\dev\6.0\data\falcon_master.fts: 0 fetches, 0 reads, 2 writes
Activity on c:\dev\6.0\data\falcon_user.fts: 10 fetches, 0 reads, 0 writes
SerialLog: 0 reads, 1 writes, 1 transactions, 2 blocks, 1 windows
Stalled threads
  Thread 8ec00b4 (9192) sleep=0, grant=0, locks=1, who 0, javaThread 0
      prior=cccccccc, next=cccccccc, parent=0
Stalled synchronization objects:
------------------------------------
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
SerialLog: 0 reads, 1 writes, 1 transactions, 3 blocks, 1 windows
Exception: deadlock on table TEST.T1#P#P8
Exception: deadlock on table TEST.T1#P#P8
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
SerialLog: 0 reads, 1 writes, 0 transactions, 0 blocks, 1 windows
Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
[28 Jun 2007 10:57] Calvin Sun
At the end of the test, T1 still hangs. I believe it is correct since T2 has not released
the lock on s1 = 9 yet. Suggest to complete the test case.
[29 Jul 2007 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[5 Sep 2007 3:36] Kevin Lewis
This testcase hangs Falcon now because it expects Falcon to do what Innodb does.  Innodb
automatically does a rollback of any transaction that gets a deadlock.  See
storage\innobase\handler\ha_innodb.cc; line 581.
So the testcase waits when it does not need to and does not commit when it should.  

However, Falcon is not actually returning a deadlock error all the way through the
handler interface.  The error gets interpreted as 'no record found' on the select for
update that should have caused a deadlock.  I can play with returning  this deadlock
error all the way to the server, but it might affect some other testcases.

The other question for Ann and Jim is whether we want Falcon to automatically rollback
any transaction that gets a deadlock like Innodb does in the code mentioned above.

The error that is returned is;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
[5 Sep 2007 21:26] Kevin Lewis
It has been decided by the Falcon team that deadlocks will not cause the transaction to
automatically abort.  This is rude and presumptive and against the SQL standard.  It also
prevents wise database designers from designing transactions that can move forward when a
deadlock is detected.

I am currently testing a set of changes to Falcon that allow a deadlock detected during a
call to fetchForUpdate to be returned to the calling engine.  This also includes changes
to falcon_bug_28026.test so that a deadlock code is expected immediately.

In addition, I have added a new testcase, falcon_bug_29332, which tests this deadlock
code using not only a partitioned table, but also an indexed table.  This test expects
the transaction to remain active after receiving the deadlock error.

When these test cases use a partition table, they require a change to ha_partition.cc. 
In ha_partition::rnd_next(), there are two 'int result = ...' declarations. the second
one receives the result of the call into the storage engine and if there is an error, it
is lost when that 'result' goes out of scope.  The required fix is to change;
    int result= file->rnd_next(buf);
to;
    result= file->rnd_next(buf);

Ingo and/or Mikael, please verify this change and I will push it to falcon-5.1.
[6 Sep 2007 0:01] Kevin Lewis
Pushed to Falcon repository.  Still need code review for the change to ha_partition.
[6 Sep 2007 6:07] Kevin Lewis
The following are email conversations about deadlocks without rollbacks;

Ann Harrison wrote;
I have always considered that transaction control is the responsibility of the client.  A
deadlock, like a validation error, causes the current operation to fail and its changes to
be undone but does not cause the whole transaction to rollback.

James Day wrote;
Agree with Ann here.  For background on why InnoDB acts as it does, see the private
comments in:  http://bugs.mysql.com/bug.php?id=12308

Mark Matthews wrote;
Please make sure that you use a SQLState and error code that represents _deadlock_ with
no rollback, versus one w/ rollback, because some people "care". 
For example, the JDBC driver keys off of SQLStates that start with "40" to tell
applications that their _transaction_ was rolled back by the server. (InnoDB currently
returns an error code of ERR_LOCK_DEADLOCK, which resolves to "40001" as a SQLState).
If we're now going to have engines that _don't_ do rollback on deadlock automatically, we
_must_ use a different error code and sql state so that applications and drivers can tell
what actually happened without using magic.

Mark Matthews added;
I just want to make sure that we don't re-use the SQLState of 40001 (which is currently
mapped to ER_LOCK_DEADLOCK in sql_state.h) since that has a meaning, according to the SQL
Spec.
As far as "codes", I leave that to the server folks. I suppose you can't return
ER_LOCK_DEADLOCK, because that is hard-mapped to the SQLState of 40001.
We're kind of in a bind, because the "class" of "40" is "transaction rollback", so
technically we can't re-use that for Falcon's behavior, _but_ deadlock is in there,
because the subclass "001" is for "serialization failure".
I know that Oracle uses "61000" for deadlock (because it doesn't roll back the
transaction either, if I remember correctly), and "6...." is in the
vendor/implementation-specific range, so maybe we punt and do the same (or make up our
own vendor-specific code?)
Maybe we punt and ask PeterG to interpret the SQL Spec for us in this case (serialization
failure with no transaction rollback)?

Peter Gulutzan responded;
The SQL standard doesn't have an error for deadlock without rollback. You're right, class
'40' is inappropriate. Just glancing at Oracle documentation, I guess people would expect
the same for Oracle's '61000'.
DB2 Version 9.1 for z/OS has sqlstate = '57033' "Deadlock or timeout occurred without
automatic rollback". 
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.m...
This error can occur when an implicit rollback fails, that is, it is not an exact match
of MySQL's situation, but class '57' "Resource Not Available or Operator Intervention"
looks okay as a precedent.
[6 Sep 2007 18:26] Ingo Strüwing
I agree regarding 'int result = ...'. I do even propose to get rid of the initialization a
top of the function. 'result' is not used until it is unconditionallly set in the loop.
And there is no chance not to enter the loop.
[6 Sep 2007 18:29] Ingo Strüwing
Where is the patch for this bug? Don't know what to review. Is the state "Patch pending"
correct?
[19 Sep 2007 15:03] Hakan Kuecuekyilmaz
falcon_bug_28026.test passes now:

=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

main.falcon_bug_28026          [ pass ]           1245
-------------------------------------------------------
Stopping All Servers
All 1 tests were successful.