Bug #39897 lock_multi fails in pushbuild: timeout waiting for processlist
Submitted: 7 Oct 2008 8:13 Modified: 17 Feb 17:04
Reporter: Sven Sandberg
Status: Closed
Category:Tests: Server Severity:S2 (Serious)
Version:5.1, 6.0 OS:Any
Assigned to: Davi Arnaut Target Version:5.1
Tags: sporadic, test failure, pushbuild
Triage: Triaged: D3 (Medium)

[7 Oct 2008 8:13] Sven Sandberg
Description:
main.lock_multi fails in two ways on pushbuild. These are similar but not the same as
BUG#34311. These failures occurred after BUG#34311 was closed.

-------- FAILURE 1 --------
main.lock_multi                [ fail ]

---
/data0/pushbuild/pb2/pb/mysql-6.0-opt/205/mysql-6.0.6-alpha-pb205/mysql-test/r/lock_multi.
result	2008-05-04 22:28:47.000000000 +0300
+++
/data0/pushbuild/pb2/pb/mysql-6.0-opt/205/mysql-6.0.6-alpha-pb205/mysql-test/r/lock_multi.
reject	2008-05-04 23:05:48.246348823 +0300
@@ -102,6 +102,8 @@
 lock table t1 read;
 update t1 set i= 10;;
 select * from t1;;
+Timeout in wait_condition.inc for select count(*) = 1 from
information_schema.processlist
+where state = "Table lock" and info = "select * from t1"
 kill query ID;
 i
 ERROR 70100: Query execution was interrupted

mysqltest: Result content mismatch

Stopping All Servers
Restoring snapshot of databases
-------- END FAILURE 1 --------

-------- FAILURE 2 --------
main.lock_multi                [ fail ]

---
/data0/pushbuild/pb1-3/pb/mysql-6.0-runtime/261/mysql-6.0.6-alpha-pb261/mysql-test/r/lock_
multi.result	2008-04-21 00:39:45.000000000 +0300
+++
/data0/pushbuild/pb1-3/pb/mysql-6.0-runtime/261/mysql-6.0.6-alpha-pb261/mysql-test/r/lock_
multi.reject	2008-04-21 01:06:40.000000000 +0300
@@ -162,5 +162,9 @@
 connection: flush
 flush tables with read lock;;
 connection: default
+Timeout in wait_condition.inc for select count(*) = 1 from
information_schema.processlist
+where state = "Flushing tables"
 flush tables;
+Timeout in wait_condition.inc for select count(*) = 1 from
information_schema.processlist
+where state = "Flushing tables"
 drop table t1;

mysqltest: Result length mismatch

Stopping All Servers
Restoring snapshot of databases
-------- END FAILURE 2 --------

How to repeat:
xref: http://tinyurl.com/4cne8s

Example of failure 1:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=bzr_mysql-5.1-rpl&order=59
vm-win2003-64-b/n_mix

Example of failure 2:
https://intranet.mysql.com/secure/pushbuild/showpush.pl?dir=mysql-6.0-build&order=172
sles10-ia64-a-1/n_stm
[16 Dec 2008 11:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61740

2809 Davi Arnaut	2008-12-16
      Bug#39897: lock_multi fails in pushbuild: timeout waiting for processlist
      
      The problem is that relying on the "Table lock" thread state to
      detect that a thread is waiting on a lock is race prone. The "Table
      lock" state change happens before the thread actually tries to grab
      a lock on a table.
      
      The solution is to introduce a new "Waiting for lock" state that is
      set only when a thread is actually going to wait for a lock. The state
      is change happens after the thread fails to grab the lock (because it
      is owned by other thread) and proceeds to wait on a condition.
[15 Jan 15:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/63362

2967 Davi Arnaut	2009-01-15
      Bug#39897: lock_multi fails in pushbuild: timeout waiting for processlist
      
      The problem is that relying on the "Table lock" thread state in
      its current position to detect that a thread is waiting on a lock
      is race prone. The "Table lock" state change happens before the
      thread actually tries to grab a lock on a table.
      
      The solution is to move the "Table lock" state so that its set
      only when a thread is actually going to wait for a lock. The state
      change happens after the thread fails to grab the lock (because it
      is owned by other thread) and proceeds to wait on a condition.
[10 Feb 12:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/65726

3039 Davi Arnaut	2009-01-15
      Bug#39897: lock_multi fails in pushbuild: timeout waiting for processlist
      
      The problem is that relying on the "Table lock" thread state in
      its current position to detect that a thread is waiting on a lock
      is race prone. The "Table lock" state change happens before the
      thread actually tries to grab a lock on a table.
      
      The solution is to move the "Table lock" state so that its set
      only when a thread is actually going to wait for a lock. The state
      change happens after the thread fails to grab the lock (because it
      is owned by other thread) and proceeds to wait on a condition.
      modified:
        mysys/thr_lock.c
        sql/lock.cc
[10 Feb 16:00] Davi Arnaut
Queued to 6.0-bugteam
[14 Feb 14:00] Bugs System
Pushed into 6.0.10-alpha (revid:matthias.leich@sun.com-20090212211028-y72faag15q3z3szy)
(version source revid:matthias.leich@sun.com-20090212211028-y72faag15q3z3szy) (merge vers:
6.0.10-alpha) (pib:6)
[14 Feb 18:06] Paul DuBois
Am I correct in thinking that this adds a new SHOW PROCESSLIST state that should be listed
(with a definition) in the manual?
[14 Feb 19:49] Davi Arnaut
The "Table lock" state already existed. The difference is that this state is now only set
when the calling thread must wait for the lock to become available (and remains set for
the duration of the wait).
[16 Feb 2:27] Paul DuBois
Noted in 6.0.10 changelog.

Threads were set to the "Table lock" state in such a way that use of
this state by other threads to check for a lock wait was subject to a
race condition.