MySQL Bugs: #34174: Infinite loop checking rolled back record in select for update

Bug #34174	Infinite loop checking rolled back record in select for update
Submitted:	30 Jan 2008 19:43	Modified:	15 May 2009 17:06
Reporter:	Ann Harrison	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0-falcon-team	OS:	Any
Assigned to:	Kevin Lewis	CPU Architecture:	Any
Tags:	F_ISOLATION

Description:
Under some circumstances, a rolled back record appears not
to be removed.  The code in Table::fetchForUpdate that checks
records goes into a loop around line 3387.

The script below runs T1's first actions alone, then
in another connection T2's actions.  T2 stalls on the
insert of (0,3) waiting for T1 to complete.  While it
is stalled, run T1's next set of actions.  When T1
rolls back, the T2 thread goes into the infinite loop.
Set a breakpoint on break after the case on state here

			case WasActive:
			case RolledBack:
				break;

to avoid losing the machine

How to repeat:
T1:

set @@autocommit=0;
create database db62;
use db62;
drop table if exists x1;
create table x1 (x1 int primary key, x2 int) engine=falcon;
set transaction isolation level serializable;
start transaction;
insert into x1 values (0,0);

T2:

set @@autocommit=1;
use db62;
insert into x1 values (1,1);
insert into x1 values (0,3);
update x1 set x1 = 0, x2 = 5;
insert into x1 values (0,6);

T1:

update x1 set x1 = 1, x2 = 4;
rollback;

you can (and should) remove the 
  set transaction isolation level serializable
statement from the script.  It's an artifact of 
an older problem

Thank you for the bug report.

Jim submitted the following patch.  I reviewed and tested it.

ChangeSet@1.2790, 2008-01-30 14:51:13-05:00, jas@rowvwade. +1 -0
  Clear RecordVersion::superceded bit when backing out
  a failed update.

  storage/falcon/Table.cpp@1.38, 2008-01-30 14:51:05-05:00, jas@rowvwade. +6 -0
    Clear RecordVersion::superceded bit when backing out
    a failed update.

diff -Nrup a/storage/falcon/Table.cpp b/storage/falcon/Table.cpp
--- a/storage/falcon/Table.cpp  2008-01-28 15:01:56 -06:00
+++ b/storage/falcon/Table.cpp  2008-01-30 13:51:05 -06:00
@@ -1189,6 +1189,9 @@ void Table::update(Transaction * transac

                if (record)
                        {
+                       if (record->priorVersion)
+                               record->priorVersion->setSuperceded(false);
+
                        if (record->state == recLock)
                                record->deleteData();

@@ -3034,6 +3037,9 @@ void Table::update(Transaction * transac

                if (record)
                        {
+                       if (record->priorVersion)
+                               record->priorVersion->setSuperceded(false);
+
                        if (record->state == recLock)
                                record->deleteData();

Test case for the fix is missing!

Patch is in mysql-6.0-release version 6.0.4

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43290

ChangeSet@1.2585, 2008-03-02 20:17:44-06:00, klewis@klewis-mysql. +3 -0
  Disable falcon_bug_34351_A & falcon_bug_34351_A for bug 34990
  Add testcase for Bug#34174

Pushed into 6.0.4-alpha

Noted in 6.0.4 changelog.

For Falcon, under some circumstances, a rolled back record could 
appear not to be removed.

Still fails from time to time with:

falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ fail ]

mysqltest: At line 44: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 1213...

Putting this Short Description back to its original cause and setting to 'Documenting'.  The original infinite loop was fixed for this bug before it was reopened for the wait lock timeout.  But Bug#41521 was opened and fixed for that problem.  So this bug should be closed. 

According to pushbuild xref, the testcaase for this bug was failing with a timeout quite often until a sleep was added to the test in mid January.  Since then, the test has failed only a few times.  I suggest increasing the sleep time.

A note has been added to the 6.0.11 changelog: 

With Falcon tables running concurrent transactions, some transactions may not be rolled back correctly, leading to an infinite loop.