MySQL Bugs: #37994: InnoDb waits for lock when DELETE from a table previously in a interrupted trans

Bug #37994	InnoDb waits for lock when DELETE from a table previously in a interrupted trans
Submitted:	9 Jul 2008 14:46	Modified:	24 Jan 2009 8:51
Reporter:	Fredrik Wallroth	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.0.51a-community	OS:	Any (seen on windows server 2003 and linux)
Assigned to:		CPU Architecture:	Any

Description:
An insert that runs in a transaction is interrupted by network problems just before commit.

Afterwards the following command:
DELETE FROM test;

(where test is table beeing inserted into) times out with the following error message:
"Lock wait timeout exceeded; try restarting transaction"

* It does not matter if the delete command is run from another machine not previously connected to the database server.
* It does not help to have a false where clause on the delete query either.
* Selects against the table works fine.
* The problems seem to linger until the database is restarted, but it also seems to go away by itself if waiting for several hours.

Workarounds:
* Don't use InnoDb.
* Don't use transactions.
* Restart database server every time the problem occurs.
* Drop the and recreate the table every time the problem occurs.

How to repeat:
Follow the steps to reproduce bug 37991:
http://bugs.mysql.com/37991

After the last step, open up a db client, connect to the database and enter the query:

delete from test;

Expected results:
* The query executes immediately and deletes 0 rows (since the table shall be empty).
* The query is not waiting for any locks from an insert running on a connection that has been broken.

Actual results:
* The query hangs for about 50 seconds (the timeout param amount of time) and the fails with:
"Lock wait timeout exceeded; try restarting transaction"

Suggested fix:
Rollback transactions from broken connections.

If this is "as designed" and the broken connection (and it's transaction) is suppesed to be kept alive very long, this is instead an error of the Connector/NET where it is not possible get hold of and reset the "broken connection" which seem to remain in the pool after network problems has occured during commit. See related bug 37991: http://bugs.mysql.com/37991.

This looks more like a .net bug.

Can someone please test again, with code that contains the fix for http://bugs.mysql.com/bug.php?id=37991?

That fix addresses issues with the driver code, and not InnoDB.  It seems this related problem may also not be InnoDB-specific.

I'v now tested this on the connector v 5.2.3.0 and get exactly the same problem.

(It would be strange if a fix in the connecter that fixes something that occurs after the network connection to the db server has been broken could affect the server in any way).

Fredrik,

this looks like normal behavior. I think what happens is this:

1. client connects to MySQL, adds rows to the table. The client gets locks on the new records.

2. before the client commits the transaction (and thus releases the locks) the connection between MySQL server and client breaks down.

3. new client connects, tries to delete all records from table.

4. client gives up after waiting unsuccessfully for the necessary locks.

The MySQL server has two measures to cope with that kind of problem:

wait_timeout: if a client is inactive for longer than wait_timeout seconds, the connection is closed. Locks will be released, uncommitted transactions rolled back. This timeout defaults to 8 hours which is probably longer than necessary for most users. Timeout can be set globally and also per session.

TCP keepalive: the MySQL server enables TCP keepalive on the socket. This will make the TCP stack send probe packets after a certain inactivity period and spot the broken detection after not receiving answers. Timeout parameters are taken from the operating system and are typically way too relaxed. Initial timeout is in the order of 2 hours. Can be tuned in the Windows registry or in Linux sysfs.

You can check the above theory by running SHOW INNODB STATUS while the client is waiting for the locks. It should show the delete waiting for locks and another transaction holding the locks. SHOW PROCESSLIST will show the respective client connections. If the dead connection is KILLed, the delete will succeed.

Thank you Axel, we will investigate your proposals which probably is what we need to handle our problem!

BR

Fredrik Wallroth

Please, inform about any results of your investigation.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".