Bug #29332 | falcon deadlocks when running falcon_bug_28026.test | ||
---|---|---|---|
Submitted: | 25 Jun 2007 10:44 | Modified: | 19 Sep 2007 13:03 |
Reporter: | Sergey Vojtovich | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | OS: | Any | |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
[25 Jun 2007 10:44]
Sergey Vojtovich
[25 Jun 2007 11:56]
MySQL Verification Team
Thank you for the bug report. c:\dev\6.0>bin\mysqld-nt --standalone --console 070625 8:48:46 InnoDB: Started; log sequence number 0 46409 070625 8:48:46 [Note] Event Scheduler: Loaded 0 events 070625 8:48:47 [Note] bin\mysqld-nt: ready for connections. Version: '6.0.1-alpha-nt' socket: '' port: 3306 Source distribution Recovering database c:\dev\6.0\data\falcon_master.fts ... first recovery block is 1 last recovery block is 156 recovery read block is 155 Recovery complete Transactions: 10 committed, 0 rolled back, 0 active, 1 post-commit Activity on c:\dev\6.0\data\falcon_master.fts: 1191 fetches, 60 reads, 32 writes Activity on c:\dev\6.0\data\falcon_temporary.fts: 0 fetches, 1 reads, 1 writes Activity on c:\dev\6.0\data\falcon_user.fts: 20 fetches, 2 reads, 11 writes SerialLog: 0 reads, 10 writes, 3 transactions, 4 blocks, 1 windows Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit Activity on c:\dev\6.0\data\falcon_master.fts: 0 fetches, 0 reads, 2 writes Activity on c:\dev\6.0\data\falcon_user.fts: 10 fetches, 0 reads, 0 writes SerialLog: 0 reads, 1 writes, 1 transactions, 2 blocks, 1 windows Stalled threads Thread 8ec00b4 (9192) sleep=0, grant=0, locks=1, who 0, javaThread 0 prior=cccccccc, next=cccccccc, parent=0 Stalled synchronization objects: ------------------------------------ Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit SerialLog: 0 reads, 1 writes, 1 transactions, 3 blocks, 1 windows Exception: deadlock on table TEST.T1#P#P8 Exception: deadlock on table TEST.T1#P#P8 Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit SerialLog: 0 reads, 1 writes, 0 transactions, 0 blocks, 1 windows Transactions: 0 committed, 0 rolled back, 2 active, 0 post-commit
[28 Jun 2007 8:57]
Calvin Sun
At the end of the test, T1 still hangs. I believe it is correct since T2 has not released the lock on s1 = 9 yet. Suggest to complete the test case.
[28 Jul 2007 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[5 Sep 2007 1:36]
Kevin Lewis
This testcase hangs Falcon now because it expects Falcon to do what Innodb does. Innodb automatically does a rollback of any transaction that gets a deadlock. See storage\innobase\handler\ha_innodb.cc; line 581. So the testcase waits when it does not need to and does not commit when it should. However, Falcon is not actually returning a deadlock error all the way through the handler interface. The error gets interpreted as 'no record found' on the select for update that should have caused a deadlock. I can play with returning this deadlock error all the way to the server, but it might affect some other testcases. The other question for Ann and Jim is whether we want Falcon to automatically rollback any transaction that gets a deadlock like Innodb does in the code mentioned above. The error that is returned is; ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
[5 Sep 2007 19:26]
Kevin Lewis
It has been decided by the Falcon team that deadlocks will not cause the transaction to automatically abort. This is rude and presumptive and against the SQL standard. It also prevents wise database designers from designing transactions that can move forward when a deadlock is detected. I am currently testing a set of changes to Falcon that allow a deadlock detected during a call to fetchForUpdate to be returned to the calling engine. This also includes changes to falcon_bug_28026.test so that a deadlock code is expected immediately. In addition, I have added a new testcase, falcon_bug_29332, which tests this deadlock code using not only a partitioned table, but also an indexed table. This test expects the transaction to remain active after receiving the deadlock error. When these test cases use a partition table, they require a change to ha_partition.cc. In ha_partition::rnd_next(), there are two 'int result = ...' declarations. the second one receives the result of the call into the storage engine and if there is an error, it is lost when that 'result' goes out of scope. The required fix is to change; int result= file->rnd_next(buf); to; result= file->rnd_next(buf); Ingo and/or Mikael, please verify this change and I will push it to falcon-5.1.
[5 Sep 2007 22:01]
Kevin Lewis
Pushed to Falcon repository. Still need code review for the change to ha_partition.
[6 Sep 2007 4:07]
Kevin Lewis
The following are email conversations about deadlocks without rollbacks; Ann Harrison wrote; I have always considered that transaction control is the responsibility of the client. A deadlock, like a validation error, causes the current operation to fail and its changes to be undone but does not cause the whole transaction to rollback. James Day wrote; Agree with Ann here. For background on why InnoDB acts as it does, see the private comments in: http://bugs.mysql.com/bug.php?id=12308 Mark Matthews wrote; Please make sure that you use a SQLState and error code that represents _deadlock_ with no rollback, versus one w/ rollback, because some people "care". For example, the JDBC driver keys off of SQLStates that start with "40" to tell applications that their _transaction_ was rolled back by the server. (InnoDB currently returns an error code of ERR_LOCK_DEADLOCK, which resolves to "40001" as a SQLState). If we're now going to have engines that _don't_ do rollback on deadlock automatically, we _must_ use a different error code and sql state so that applications and drivers can tell what actually happened without using magic. Mark Matthews added; I just want to make sure that we don't re-use the SQLState of 40001 (which is currently mapped to ER_LOCK_DEADLOCK in sql_state.h) since that has a meaning, according to the SQL Spec. As far as "codes", I leave that to the server folks. I suppose you can't return ER_LOCK_DEADLOCK, because that is hard-mapped to the SQLState of 40001. We're kind of in a bind, because the "class" of "40" is "transaction rollback", so technically we can't re-use that for Falcon's behavior, _but_ deadlock is in there, because the subclass "001" is for "serialization failure". I know that Oracle uses "61000" for deadlock (because it doesn't roll back the transaction either, if I remember correctly), and "6...." is in the vendor/implementation-specific range, so maybe we punt and do the same (or make up our own vendor-specific code?) Maybe we punt and ask PeterG to interpret the SQL Spec for us in this case (serialization failure with no transaction rollback)? Peter Gulutzan responded; The SQL standard doesn't have an error for deadlock without rollback. You're right, class '40' is inappropriate. Just glancing at Oracle documentation, I guess people would expect the same for Oracle's '61000'. DB2 Version 9.1 for z/OS has sqlstate = '57033' "Deadlock or timeout occurred without automatic rollback". http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.m... This error can occur when an implicit rollback fails, that is, it is not an exact match of MySQL's situation, but class '57' "Resource Not Available or Operator Intervention" looks okay as a precedent.
[6 Sep 2007 16:26]
Ingo Strüwing
I agree regarding 'int result = ...'. I do even propose to get rid of the initialization a top of the function. 'result' is not used until it is unconditionallly set in the loop. And there is no chance not to enter the loop.
[6 Sep 2007 16:29]
Ingo Strüwing
Where is the patch for this bug? Don't know what to review. Is the state "Patch pending" correct?
[19 Sep 2007 13:03]
Hakan Küçükyılmaz
falcon_bug_28026.test passes now: ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.falcon_bug_28026 [ pass ] 1245 ------------------------------------------------------- Stopping All Servers All 1 tests were successful.