Bug #41564 Falcon; incorrect lock wait timeouts can occur
Submitted: 17 Dec 2008 18:35 Modified: 19 Dec 2008 6:03
Reporter: Kevin Lewis Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S3 (Non-critical)
Version:6.0.9 OS:Any
Assigned to: Kevin Lewis
Tags: F_SYNCHRONIZATION
Triage: Needs Triage: D3 (Medium)

[17 Dec 2008 18:35] Kevin Lewis
Description:
While working on Bug#41194 (Falcon durability should occur earlier in the commit process), I found this bug in SyncObject::wait().  My first patch for that bug created a race condition in which the client thread unlocks an exclusive lock on syncIsActive at about the same time as the gopher thread tries to get a shared lock on it.  I noticed that periodically, the lock request would timeout without being signaled.  This is the same thing as a timeout on a waitForTransaction(), or a wait lock timeout.  So that first patch created a condition in which this incorrect timeout can occur more frequently.

The timeout is incorrect because after the wait, it is obvious that syncIsActive had been unlocked.  Somehow, the waiting thread was not signaled.
 

How to repeat:
Found in alpha testing.  It would be exhibited in extra lock wait timeout errors during a stress test.

Suggested fix:
The problem occurs because of an inadvertent wakeup.  In other words, a waiting thread is woke up inadvertently, not by the event being signaled (this is on Windows).  I'm not sure if premature wakeups can occur on Linux which uses a PTHREAD implementation for waiting. But it can apparently happen on Windows events. 

When this occurs, SyncObject::wait(), in the section  if (timeout){}, will take the thread off the queue and then wait again since thread->lockGranted is not yet set.  This time, it waits the full timeout. The event it is waiting on is not signaled when the lock is granted because this waiting thread is no longer on the queue. 

The code to take this thread off the waiting queue should be done only if it already waited the timeout and an exception is about to be thrown by SyncObject::timedout().
[17 Dec 2008 19:37] Hakan Küçükyılmaz
lock wait timeout is seen in Bug#41521, too.
[19 Dec 2008 5:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62041

2939 Kevin Lewis	2008-12-18
      Bug#41521 and Bug#41564.  
      Sometimes a call to  Thread::sleep(timeout, callersMutex)
      from SyncObject::wait() can wakeup before the lock has been 
      granted.  If it does this, the code was taking the thread 
      off the SyncObject queue and then calling the sleep again. 
      This time, the lock cannot be granted because the lock owner
      does not signal the waiting thread.  It is not queued.
      A lock wait timeout will then occur.
      The thread should only be taken off the queue if a lock wait 
      timeout really does occur.
[19 Dec 2008 6:03] Kevin Lewis
Since this bug has no testcase, and I think it would cause the same affect as what is seem in Bug#41521,  I will call it a duplicate of that one, which has a testcase.
[19 Dec 2008 18:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62151

2948 Kevin Lewis	2008-12-19
      Bug#41521 and Bug#41564.  
      Sometimes a call to  Thread::sleep(timeout, callersMutex)
      from SyncObject::wait() can wakeup before the lock has been 
      granted.  If it does this, the code was taking the thread 
      off the SyncObject queue and then calling the sleep again. 
      This time, the lock cannot be granted because the lock owner
      does not signal the waiting thread.  It is not queued.
      A lock wait timeout will then occur.
      The thread should only be taken off the queue if a lock wait 
      timeout really does occur.
[13 Feb 2009 7:23] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:klewis@mysql.com-20081219183049-0h8nib65189lh4lr) (merge vers: 6.0.9-alpha) (pib:6)