MySQL Bugs: #41521: falcon_bug_34174 fails with lock wait timeout, sometimes

Bug #41521	falcon_bug_34174 fails with lock wait timeout, sometimes
Submitted:	16 Dec 2008 21:05	Modified:	7 May 2009 16:56
Reporter:	Hakan Küçükyılmaz	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	mysql-6.0-falcon	OS:	Any
Assigned to:	Hakan Küçükyılmaz	CPU Architecture:	Any
Tags:	F_TEST, sporadic, test failure

Description:
falcon_bug_34174 fails from time to time with:

falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ fail ]

mysqltest: At line 44: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205:
'Lock wait timeout exceeded; try restarting transaction', instead of 1213...

How to repeat:
 ./mysql-test-run.pl  --force --skip-ndb --suite=falcon_team falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174 falcon_bug_34174

=======================================================

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

falcon_team.falcon_bug_34174   [ pass ]             19
falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ pass ]             16
falcon_team.falcon_bug_34174   [ pass ]             17
falcon_team.falcon_bug_34174   [ fail ]

mysqltest: At line 43: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 1213...

The result from queries just before the failure was:
*** Bug #34174 ***
SET @@storage_engine = 'Falcon';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (f1 int primary key, f2 int);
# establishing conn1
# switching to default connection
START TRANSACTION;
INSERT INTO t1 VALUES (0,0);
# switching to conn1
START TRANSACTION;
SELECT * FROM t1;
f1      f2
INSERT INTO t1 VALUES (1,1);
INSERT INTO t1 VALUES (0,3);
# switching to default connection
SELECT * FROM t1;
f1      f2
0       0
UPDATE t1 SET f1 = 1, f2 = 4;
ERROR HY000: Lock wait timeout exceeded; try restarting transaction

More results from queries before failure can be found in /home/hakan/work/mysql/mysql-6.0-falcon-team-to-merge/mysql-test/var/log/falcon_bug_34174.log

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

falcon_team.falcon_bug_34174   [ fail ]

mysqltest: At line 43: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 1213...

The result from queries just before the failure was:
*** Bug #34174 ***
SET @@storage_engine = 'Falcon';
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (f1 int primary key, f2 int);
# establishing conn1
# switching to default connection
START TRANSACTION;
INSERT INTO t1 VALUES (0,0);
# switching to conn1
START TRANSACTION;
SELECT * FROM t1;
f1      f2
INSERT INTO t1 VALUES (1,1);
INSERT INTO t1 VALUES (0,3);
# switching to default connection
SELECT * FROM t1;
f1      f2
0       0
UPDATE t1 SET f1 = 1, f2 = 4;
ERROR HY000: Lock wait timeout exceeded; try restarting transaction

More results from queries before failure can be found in /home/hakan/work/mysql/mysql-6.0-falcon-team-to-merge/mysql-test/var/log/falcon_bug_34174.log

Stopping All Servers
Restoring snapshot of databases
Resuming Tests

falcon_team.falcon_bug_34174   [ pass ]             18
^Cmysql-test-run: WARNING: got INT signal, cleaning up.....
Stopping All Servers

Could be related to Bug#34174.

Thank you for the report.

Verified as described on 64-bit Linux machine. I had to run test more than 100 times to repeat.

Possibly related to #41564: Falcon; incorrect lock wait timeouts can occur

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62041

2939 Kevin Lewis	2008-12-18
      Bug#41521 and Bug#41564.  
      Sometimes a call to  Thread::sleep(timeout, callersMutex)
      from SyncObject::wait() can wakeup before the lock has been 
      granted.  If it does this, the code was taking the thread 
      off the SyncObject queue and then calling the sleep again. 
      This time, the lock cannot be granted because the lock owner
      does not signal the waiting thread.  It is not queued.
      A lock wait timeout will then occur.
      The thread should only be taken off the queue if a lock wait 
      timeout really does occur.

Hakan wrote;
Unfortunately, your fix does not help. I see quite a speed up in the test execution, but the timeout happens from time to time just like it did before your push.

29 out of 155 runs failed:

...
...
falcon_team.falcon_bug_34174   [ pass ]              6
falcon_team.falcon_bug_34174   [ pass ]              4
falcon_team.falcon_bug_34174   [ fail ]

mysqltest: At line 43: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with
wrong errno 1205: 'Lock wait timeout exceeded; try restarting
transaction', instead of 1213...
------------------------------------------------------

I ( Kevin) could not reproduce this until I tried the test 500 times.  It finally failed on the 254th time.  So I am putting it back into 'In Progress' to take another look.

Svoj,  I found and fixed a logic problem when a thread waits for another.  It was for Bug #41564 Falcon; incorrect lock wait timeouts can occur.  I thought it fixed the problem in this bug, and I think it did, for the most part.  But there is still some small timing window in which a waiting thread does not get signalled.  The previous problem was caused by a wakeup before the lock was granted, in which the waiting thread took itself off the wait queue.  So the real wakeup was not able to signal that waiting thread.  With that fix, this error happens a lot less often.  

This bug will give you an opportunity to investigate the workings of the SyncObject.  Please contact me if you have any questions.

To start, put a break under this condition;
	if (!wokeup)
		{
		// A timeout occured.
		// Take this thread off the queue and throw an exception
in the if (timeout) section of SyncObject::wait().
See if the thread is on the wait queue or not.

I just noticed that this testcase does not contain a --real_sleep 1 command after the first --send.  We have found in the past that if you are going to use --send to set up a wait state between one transaction/thread and another, you must use a --real_sleep 1 following it so that the engine has time to actually process that send before the next command comes in.  This test is missing that sleep.  Since the test on Hakan's machine typically runs in 4 to 6 miliseconds, it is very likely that the second thread overtakes the first.  Once a --real_sleep 1 is added, the typical test time is over a second.  And it never fails.
. . .
falcon_team.falcon_bug_34174   [ pass ]           1140
falcon_team.falcon_bug_34174   [ pass ]           1188
falcon_team.falcon_bug_34174   [ pass ]           1047
------------------------------------------------------
Stopping All Servers
All 400 tests were successful.

--echo # switching to conn1
connection conn1;
START TRANSACTION;
SELECT * FROM t1;
INSERT INTO t1 VALUES (1,1);
--send INSERT INTO t1 VALUES (0,3)
--echo # switching to default connection
connection default;
--real_sleep=10
SELECT * FROM t1;
--error ER_LOCK_DEADLOCK
UPDATE t1 SET f1 = 1, f2 = 4;

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63052

2971 Kevin Lewis	2009-01-12
      Bug#41521 - Add a sleep after the --send INSERT to allow 
      this command to be processed by Falcon before the next
      command comes in wit a different thread.

Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)

According to pushbuild xref, the testcaase for this bug was failing with a timeout quite often until a sleep was added to the test in mid January.  Since then, the test has failed only a few times.  I suggest increasing the sleep time.  Then after some period of success, maybe we can move it to suite=falcon.

Internal/test fix. No changelog entry required.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76634

2738 Kevin Lewis	2009-06-18
      Bug#41521 - Change the timing a bit more so that the first insert ALWAYS gets done before the second.