MySQL Bugs: #52377: MDL locks taken by failing statements not automatically removed

Bug #52377	MDL locks taken by failing statements not automatically removed
Submitted:	25 Mar 2010 20:15	Modified:	24 Jun 2010 10:43
Reporter:	Matthias Leich	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Locking	Severity:	S4 (Feature request)
Version:	5.5.4-m3,5.6.99-m4	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	mdl

Description:
My test:
--------
-- source include/have_innodb.inc

# Simplified test derived from partition_innodb_semi_consistent.test

CREATE TABLE t1 (a INTEGER, b INTEGER) ENGINE = InnoDB;
INSERT INTO t1 VALUES (1,1);
COMMIT;

--echo # Switch to connection con1
connect (con1,localhost,root,,);
connection con1;

BEGIN;
UPDATE t1 SET b = 12 WHERE a = 1;

--echo # Switch to connection con2
connect (con2,localhost,root,,);
connection con2;
SET SESSION AUTOCOMMIT = 0;

--error ER_LOCK_WAIT_TIMEOUT
UPDATE t1 SET b = 21 WHERE a = 1;

# If I have here a ROLLBACK than the effect disappears.
# ROLLBACK;

--echo # Switch to connection con1
connection con1;
# ROLLBACK;

# SELECT * FROM t1 FOR UPDATE does not get blocked
# CHECK TABLE t1              does not get blocked
# TRUNCATE , REPAIR TABLE gets also blocked
send DROP TABLE t1;

connection default;
let $stmt_part=
FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE STATE = 'Waiting for table' AND INFO = 'DROP TABLE t1';
let $wait_timeout= 5;
let $wait_condition= SELECT COUNT(*) = 1 $stmt_part;

--echo # Wait till DROP TABLE is blocked but not more than $wait_timeout seconds
--source include/wait_condition.inc

# Show if the DROP TABLE is blocked
eval SELECT STATE,INFO $stmt_part;

Result on 5.5.4-m3-debug-log
mysql-trunk , revno: 3122 , 2010-03-24
--------------------------------------
CREATE TABLE t1 (a INTEGER, b INTEGER) ENGINE = InnoDB;
INSERT INTO t1 VALUES (1,1);
COMMIT;
# Switch to connection con1
BEGIN;
UPDATE t1 SET b = 12 WHERE a = 1;
# Switch to connection con2
SET SESSION AUTOCOMMIT = 0;
UPDATE t1 SET b = 21 WHERE a = 1;
ERROR HY000: Lock wait timeout exceeded; try restarting transaction
# Switch to connection con1
DROP TABLE t1;
# Wait till DROP TABLE is blocked but not more than 5 seconds
SELECT STATE,INFO FROM INFORMATION_SCHEMA.PROCESSLIST
WHERE STATE = 'Waiting for table' AND INFO = 'DROP TABLE t1';
STATE	INFO
Waiting for table	DROP TABLE t1

   This means that the UPDATE which failed because of
   lock timeout did not remove all locks caused by this
   statement.

5.1.45 (mysql-5.1-bugteam does not show this effect.

1. I am not 100% sure if this effect is a bug because
   the manual does not describe the situation I met.
   a) refman/5.5/en/innodb-parameters.html
      about innodb_lock_wait_timeout
      ...

How to repeat:
See above

Suggested fix:
I am not 100% sure if this effect is a bug because
the manual does not describe the situation I met.
- refman/5.5/en/innodb-parameters.html
  about innodb_lock_wait_timeout
  ...
  When a lock wait timeout occurs, the current statement is
  not executed.
  How can a statement fail if an execution is not tried?
  IMHO it is meant that any changes to data by the statement
  are reverted, something like a statement rollback.
  But what happens with locks set during the statement
  execution before the waiting for the lock started?
- refman/5.5/en/innodb-deadlock-detection.html
  ...
  However, if just a single SQL statement is rolled back
  as a result of an error, some of the locks set by the
  statement may be preserved.
  ...
  Does this apply here, though we have no deadlock?
- MySQL >= 5.5 contains metadata locking
  Could it be that MDL has to be improved?

Thank you for the report.

According to http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html this is not a bug:

A transaction that tries to access a row that is locked by another InnoDB transaction will hang for at most this many seconds before issuing the following error:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

When a lock wait timeout occurs, the current statement is not executed. The current transaction is not rolled back. (To have the entire transaction roll back, start the server with the --innodb_rollback_on_timeout option, available as of MySQL 5.1.15. See also Section 13.6.12, “InnoDB Error Handling”.)

Hi Sveta,
sorry, I have some problems to understand your
answer but maybe my report was not clear enough.

The effect I am focused on is the hanging DROP TABLE t1.

We have here MySQL >= 5.5 which has metadata locking (MDL).
A MDL lock held by some session prevents that another
session could drop the object (table,...), change it's
layout or similar.
MDL should prevent a lot unfortunate effects known
from MySQL <= 5,1.

con1:
   BEGIN a transaction
   do something
con2:
   BEGIN a transaction
   do something on table t1 which
   - collides with a lock held by con1
     => Get error message after innodb_lock_wait_timeout
   or
   - violates a constraint
     => Get error message because of duplicate key or similar
con1:
   ROLLBACK;
   DROP TABLE t1;
    ==> hangs "endless" till lock_wait_timeout is exceeded
        ERROR HY000: Lock wait timeout exceeded; try restarting transaction

   In case of MySQL 5.1 the "DROP TABLE t1" would have success.

Experiments + discussion with Jon Olav point to
the following reason:
===============================================
The action of con2 on the table t1 causes that in an
early phase of the statement processing a Metadata lock
(MDL) on t1 is set.

And this MDL lock does not get automatically removed
when it finally turns out that the statement cannot
have success.
I have the impression that row-level locks taken by a
failing statement are at least mostly removed at it's end.

The DROP TABLE t1 of con1 fails because of the MDL
lock held by con2.

IMHO from the
- SQL standard point of view the behaviour I met is
  not wrong
- system throughput/performance point of view it would
  have an advantage if MDL locks caused by a failing
  statement get automatically removed at statement end
  but I am unsure if all consequences fit to what the
  SQL standard says about ISOLATION LEVELs etc.
- user POV my scenario is probably rather rare

So what to do with this bug report? I do not know.
The behaviour I met
- is intentional    => no bug
- is drastic different than in MySQL 5.1 but not documented
  within the 5.5 manual => documentation bug
- could be probably improved but the overall
  win is rather low => change request ?

Regards

Matthias

Matthias,

thank you for the feedback.

Your arguments sounds reasonable and I did mistake during verification: missed con2 doesn't really use table t1, just tries to use it, Setting bug to "Verified", so runtime team will decide how to properly handle this problem.

The bug needs to be split into two.
We indeed need to document the effect of transactional locks on failing statements. This effect is intentional, since in with replication logging enabled, failed statements are still written to the binary log, and metadata locks protect consistency of the binary log. We need to keep these locks for both, statement and row based replication, since DDL is always replicated as SQL.

The second part of the bug is a feature/extention/performance request, to automatically remove metadata locks on failed statements if the binary log is off. In theory, we could remove metadata locks whenever we rollback a statement that only modifies transactional tables, and all effects of the statements
are being rolled back as well.

Matthias, could you please morph this report into two reports as described above and request their triage? You could perhaps reuse the bug id of this report for one of the issues.

Waiting for feedback from the reporter.

The missing documentation part of this bug report is
now moved to
Bug#54767 Please document the effect of metadata locks
          taken by failing statements

After moving the missing documentation part of this bugs out
the remaining stuff is a feature/extention/performance request.

For documentation part (Bug#54767)

Added to http://dev.mysql.com/doc/refman/5.5/en/metadata-locking.html:

        If the server acquires metadata locks for a statement that is
        syntactically valid but fails during execution, it does not
        release the locks early. Lock release is still deferred to the
        end of the transaction because the failed statement is written
        to the binary log and the locks protect log consistency.

Setting to to be fixed later -- has no significant performance impact.