MySQL Bugs: #46151: falcon_bug_34174 fails with lock wait timeout when pool-of-threads is enabled

Bug #46151	falcon_bug_34174 fails with lock wait timeout when pool-of-threads is enabled
Submitted:	13 Jul 2009 15:12	Modified:	26 May 2010 17:53
Reporter:	John Embretsen	Email Updates:
Status:	Unsupported	Impact on me:	None
Category:	MySQL Server: Falcon storage engine	Severity:	S3 (Non-critical)
Version:	6.0.12-bzr	OS:	Solaris
Assigned to:	Olav Sandstå	CPU Architecture:	Any
Tags:	F_ERROR, pushbuild

Description:
The test falcon_bug_34174 (currently in the falcon_team suite) fails with wrong error number when the pool-of-threads feature is enabled. This has so far (as far as I know) only been seen on Solaris platforms.

-----------------

falcon_team.falcon_bug_34174             [ fail ]
        Test ended at 2009-07-13 13:03:27

CURRENT_TEST: falcon_team.falcon_bug_34174
mysqltest: At line 46: query 'UPDATE t1 SET f1 = 1, f2 = 4' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 1213...

The result from queries just before the failure was:
< snip >
START TRANSACTION;
INSERT INTO t1 VALUES (0,0);
SELECT * FROM t1;
f1      f2
0       0
# switching to conn1
START TRANSACTION;
SELECT * FROM t1;
f1      f2
INSERT INTO t1 VALUES (1,1);
INSERT INTO t1 VALUES (0,3);
# switching to default connection
SELECT * FROM t1;
f1      f2
0       0

-----------------

See also Bug#41521 (falcon_bug_34174 fails with lock wait timeout, sometimes) for previous issues and work related to this test case. 

The number of test failure occurrences related to Bug#41521 were reduced by introducing longer "real" sleep times after sending statements to the server from the test client. However, increasing these sleep times does not seem to help in this case.

It seems that the following options to MTR are required in order to reproduce this failure at this time:

 --mysqld=--thread-handling=pool-of-threads
     (Tells the server to use a pool of threads to manage incoming client requests, instead of a single thread per connection. This was a new feature in MySQL 6.0 alpha.)

 --ps-protocol
     (Tells the mysqltest client to use the prepared-statement protocol for communication.)

How to repeat:
Compile MySQL w/Falcon with libevent enabled (--with-libevent).

perl mysql-test-run.pl --ps-protocol --mysqld=--thread-handling=pool-of-threads --suite=falcon_team falcon_bug_34174 --repeat=20

Assigning to Olav since it seems to only occur on Solaris.  If you can catch this in a debugger, please check if the UPDATE thread that should return as a deadlock is somehow using a different thread, caused by   --thread-handling=pool-of-threads, such that Falcon cannot recognize that there is a closed loop of threads waiting on each other.  Also, check if the previous call by conn1, --send INSERT INTO t1 VALUES (0,3), is still waiting as it should be for the transaction by the default thread to commit.

In the thread stack dump, only thread 35 is waiting.  It has just hit the wait lock timeout.  If the other thread had been waiting like it should have there would have been a deadlock detected.  So we need to debug why the conn1 thread does not wait.

When I try to run the test in a debugger with breakpoints in SQLError.cpp it rarely fails with LOCK_TIMEOUT. 

But, putting in an ASSERT or print statement in SQLError.cpp constructors triggering on DEADLOCK error code shows that a deadlock actually occurs before the LOCK_WAIT timeout occurs, when the test fails. I don't know why the deadlock is not propagated so that it is visible to clients or in the error log.

Also, once in a while the test actually passes with the same options on the same test box, meaning we get the deadlock that we expect to get and not a lock wait timeout, even with thread pooling enabled. Full stack trace for that case is available here:

https://intranet.mysql.com/secure/paste/displaypaste.php?codeid=8397