Bug #41521 | falcon_bug_34174 fails with lock wait timeout, sometimes | ||
---|---|---|---|
Submitted: | 16 Dec 2008 21:05 | Modified: | 7 May 2009 16:56 |
Reporter: | Hakan Küçükyılmaz | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-6.0-falcon | OS: | Any |
Assigned to: | Hakan Küçükyılmaz | CPU Architecture: | Any |
Tags: | F_TEST, sporadic, test failure |
[16 Dec 2008 21:05]
Hakan Küçükyılmaz
[16 Dec 2008 21:12]
Hakan Küçükyılmaz
Could be related to Bug#34174.
[16 Dec 2008 21:31]
Sveta Smirnova
Thank you for the report. Verified as described on 64-bit Linux machine. I had to run test more than 100 times to repeat.
[17 Dec 2008 21:19]
Kevin Lewis
Possibly related to #41564: Falcon; incorrect lock wait timeouts can occur
[19 Dec 2008 6:08]
Kevin Lewis
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/62041 2939 Kevin Lewis 2008-12-18 Bug#41521 and Bug#41564. Sometimes a call to Thread::sleep(timeout, callersMutex) from SyncObject::wait() can wakeup before the lock has been granted. If it does this, the code was taking the thread off the SyncObject queue and then calling the sleep again. This time, the lock cannot be granted because the lock owner does not signal the waiting thread. It is not queued. A lock wait timeout will then occur. The thread should only be taken off the queue if a lock wait timeout really does occur.
[12 Jan 2009 14:13]
Kevin Lewis
Hakan wrote; Unfortunately, your fix does not help. I see quite a speed up in the test execution, but the timeout happens from time to time just like it did before your push. 29 out of 155 runs failed: ... ... falcon_team.falcon_bug_34174 [ pass ] 6 falcon_team.falcon_bug_34174 [ pass ] 4 falcon_team.falcon_bug_34174 [ fail ] mysqltest: At line 43: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 1213... ------------------------------------------------------ I ( Kevin) could not reproduce this until I tried the test 500 times. It finally failed on the 254th time. So I am putting it back into 'In Progress' to take another look.
[12 Jan 2009 15:11]
Kevin Lewis
Svoj, I found and fixed a logic problem when a thread waits for another. It was for Bug #41564 Falcon; incorrect lock wait timeouts can occur. I thought it fixed the problem in this bug, and I think it did, for the most part. But there is still some small timing window in which a waiting thread does not get signalled. The previous problem was caused by a wakeup before the lock was granted, in which the waiting thread took itself off the wait queue. So the real wakeup was not able to signal that waiting thread. With that fix, this error happens a lot less often. This bug will give you an opportunity to investigate the workings of the SyncObject. Please contact me if you have any questions. To start, put a break under this condition; if (!wokeup) { // A timeout occured. // Take this thread off the queue and throw an exception in the if (timeout) section of SyncObject::wait(). See if the thread is on the wait queue or not.
[13 Jan 2009 3:17]
Kevin Lewis
I just noticed that this testcase does not contain a --real_sleep 1 command after the first --send. We have found in the past that if you are going to use --send to set up a wait state between one transaction/thread and another, you must use a --real_sleep 1 following it so that the engine has time to actually process that send before the next command comes in. This test is missing that sleep. Since the test on Hakan's machine typically runs in 4 to 6 miliseconds, it is very likely that the second thread overtakes the first. Once a --real_sleep 1 is added, the typical test time is over a second. And it never fails. . . . falcon_team.falcon_bug_34174 [ pass ] 1140 falcon_team.falcon_bug_34174 [ pass ] 1188 falcon_team.falcon_bug_34174 [ pass ] 1047 ------------------------------------------------------ Stopping All Servers All 400 tests were successful. --echo # switching to conn1 connection conn1; START TRANSACTION; SELECT * FROM t1; INSERT INTO t1 VALUES (1,1); --send INSERT INTO t1 VALUES (0,3) --echo # switching to default connection connection default; --real_sleep=10 SELECT * FROM t1; --error ER_LOCK_DEADLOCK UPDATE t1 SET f1 = 1, f2 = 4;
[13 Jan 2009 3:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/63052 2971 Kevin Lewis 2009-01-12 Bug#41521 - Add a sleep after the --send INSERT to allow this command to be processed by Falcon before the next command comes in wit a different thread.
[13 Feb 2009 7:24]
Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:olav@sun.com-20090113103017-41jbad7qlvlwpwxh) (merge vers: 6.0.10-alpha) (pib:6)
[19 Mar 2009 12:43]
Kevin Lewis
According to pushbuild xref, the testcaase for this bug was failing with a timeout quite often until a sleep was added to the test in mid January. Since then, the test has failed only a few times. I suggest increasing the sleep time. Then after some period of success, maybe we can move it to suite=falcon.
[7 May 2009 16:56]
MC Brown
Internal/test fix. No changelog entry required.
[19 Jun 2009 4:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/76634 2738 Kevin Lewis 2009-06-18 Bug#41521 - Change the timing a bit more so that the first insert ALWAYS gets done before the second.