Bug #71017 mysqldump creates useless metadata locks
Submitted: 26 Nov 2013 18:29 Modified: 14 Jan 2014 18:33
Reporter: Eric Bergen (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: DDL Severity:S3 (Non-critical)
Version:5.5 OS:Any
Assigned to: CPU Architecture:Any
Tags: mysqldump

[26 Nov 2013 18:29] Eric Bergen
Description:
When running mysqldump --single-transaction metadata locks are acquired as mysqldump reaches each table. These locks are not feed until mysqldump has finished dumping all of the tables. Any other ddl operation attempted on a table which mysqldump has finished dumping will block all queries from accessing that table until the original mdl is freed. 

This is incredibly disruptive for a dump mode that is not supposed to create long lived locks. The locks in this case are also especially useless because mysqldump will never need to revisit a table it has finished dumping.

How to repeat:
I will emulate mysqldump with client connection 1 here:
con1> begin;
Query OK, 0 rows affected (0.00 sec)

con1> select * from t;
+----+---------+
| t  | foo     |
+----+---------+
|  1 | bar     |
|  2 | bar     |
|  3 | bar     |
|  4 | bar     |
|  5 | bar     |
|  6 | bar     |
|  7 | NULL    |
|  8 | NULL    |
|  9 | NULL    |
| 10 | NULL    |
| 11 | NULL    |
| 12 | NULL    |
| 13 | NULL    |
| 14 | NULL    |
| 15 | bar     |
| 16 | asdfasd |
| 17 | asdfasd |
| 18 | asdfasd |
| 19 | asdfasd |
| 20 | asdfasd |
| 21 | asdfasd |
| 22 | 3hasde  |
+----+---------+
22 rows in set (0.00 sec)
* leave transaction open *

con2> optimize table t;
*hang*

con3> select * from t;
*hang*

In the old world the select on connection3 would succeed. In the new world of mdl it hangs until mysqldump is finished

Suggested fix:
Provide a way to start a transaction that does not create mdls. Possibly another option to start transaction.
[26 Nov 2013 18:33] Domas Mituzas
REGRESSION! REGRESSION!
[26 Nov 2013 19:03] Davi Arnaut
Seems like the problem is that MySQL does not yet have a crystal ball feature. This way it could unlock tables that it knows won't be used again.
[26 Nov 2013 19:05] Harrison Fisk
Why do we need to hold the MDL for read-only statements?
[26 Nov 2013 19:38] Davi Arnaut
> Why do we need to hold the MDL for read-only statements?

To prevent table structure changes while the statement is running, but this has always been the case.

For read-only transactions the locks could be released at the end of each statement if gutting the transaction is not an issue.
[26 Nov 2013 19:47] Davi Arnaut
For what it's worth, metadata locks are released on rollback to savepoint. So, I think, if you grab a savepoint before accessing each table (and rolling back afterwards), you would have something that resembles --single-transaction but without transaction-wide metadata locks.
[26 Nov 2013 20:08] Domas Mituzas
Davi, that is why I wrote in http://dom.as/2013/11/26/how-mysql-engineering-broke-the-backups/ about production implications. Yeah, sure, crystal ball, blah blah, but in this case, major functionality gets broken, and when such a change gets introduced, one needs to think how major functionality will be affected by that. 

Will large scans be accessing tables again? Probably not. Do we have to add all those /*!55../ clauses around? Maybe. Should we allow to express the "do not lock for the transaction duration"? Absolutely, as it is regression to end up with the behavior like now.
[26 Nov 2013 20:16] Davi Arnaut
It seems to me that the thing is that you guys are 4 years late to the party.
[26 Nov 2013 20:19] Eric Bergen
I've known about this bug for years I just never bothered to report it because my history of having oracle fix bugs is not great. I filed a more serious bug that results in empty tables in backups that wasn't even fixed:
http://bugs.mysql.com/bug.php?id=28432
[26 Nov 2013 20:20] Marko Mäkelä
As far as I understand, you are hitting a priority inversion bug with meta-data locks (MDL).

As noted, the meta-data locks were introduced in MySQL 5.5. InnoDB relies on them for non-locking reads (MVCC). Anything else is also protected by InnoDB internal table locks.

Bug#67286 describes another scenario, which is involving online ALTER TABLE, which can acquire an exclusive MDL at two phases: at the start (prepare) and at the very end (commit).

Basically, you have a long-running transaction that is holding a shared meta-data lock on the table. This shared lock is preventing any DDL or table definition changes (such as OPTIMIZE TABLE or ALTER TABLE).

The problem is: While the DDL thread is trying to acquire an exclusive MDL on the table, it is also preventing any further threads from acquiring an MDL on the table. So, any other DML on the table will be blocked. Basically, we have 2 connections "conspiring" to deny the service from other connections.

This problem could be worked around or fixed in two ways:

(1) As suggested, release the MDL earlier. This could be a special statement issued by mysqldump. (We do not want normal multi-statement DML operations to be hit by this. They should acquire each MDL only once per transaction.)

(2) Change the behaviour of the MDL upgrade, possibly by a configuration option. If we allow further non-conflicting MDL acquisition during the exclusive MDL request, the DDL thread could starve, which could be frustrating. Perhaps we should not allow such starvation at the end of ALTER TABLE.
[26 Nov 2013 20:28] Marko Mäkelä
Eric, I am sorry for the false claim that was made in 2010 on Bug#28432.
Do you think that the bug is still not fixed in MySQL 5.6?

It should also work with CREATE TABLE. The old transaction should refuse to
read a table that was created after the transaction started.
[26 Nov 2013 20:28] Harrison Fisk
It seems like read-only transactions shouldn't need to hold the MDL for the entire duration.  So enhancing START TRANSACTION READ ONLY to not hold MDL and then having mysqldump or mydumper use that command seems like a good solution.
[26 Nov 2013 20:35] Davi Arnaut
Like Marko said, the issue is that shared locks are not granted if an exclusive lock is pending. Another option is to have low-priority exclusive locks that do not prevent shared locks from being granted.
[26 Nov 2013 20:42] Davi Arnaut
Harrison, a downside of doing that for read-only transactions is that the transaction would not be properly isolated even in repeatable-read.
[27 Nov 2013 8:56] Dmitry Lenev
Hello Eric, All!

First of all I would like to point out that running "mysqldump
--single-transaction" and any DDL concurrently is a bad idea
and doesn't work in general case. This is clearly documented at:

http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_single-transaction

===
While a --single-transaction dump is in process, to ensure a valid dump
file (correct table contents and binary log coordinates), no other
connection should use the following statements: ALTER TABLE, CREATE TABLE,
DROP TABLE, RENAME TABLE, TRUNCATE TABLE. A consistent read is not isolated
from those statements, so use of them on a table to be dumped can cause the
SELECT that is performed by mysqldump to retrieve the table contents to
obtain incorrect contents or fail.
===

Unfortunately, this is true not only for 5.1, but for 5.5 and 5.6 as well.
The main difference is that in 5.6 mysqldump is more likely to fail with
error instead of dumping incorrect contents.

Still this is a valid documentation bug which needs to be fixed.

Taking into account the above if "mysqldump --single-transaction" manages to
work with concurrent DDL in 5.1/5.5/5.6 it is either mostly due to pure luck 
or due to an incredible skill in selecting time for concurrent DDL to run and/
or significant efforts spent on backup validation after it was dumped (but
this are not officially supported or recommended scenarios anyway).

Now one needs to realize that OPTIMIZE TABLE which you give as an example
for InnoDB table is an alias for ALTER TABLE. This is also documented at:

http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html

==
For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds
the table to update index statistics and free unused space in the clustered
index.
==

From the above it is clear that we are discussing change in behavior in
scenario which is explicitly discouraged and documented as broken in 5.1
and should be documented as discouraged and broken all the way up to 5.6.

In my opinion, it sounds like a stretch to call this change a regression.

Now let us take a look at possible ways to alleviate this issue:

1) As Davi Arnaut has correctly pointed out, releasing MDL after table
   has been used is not an option in case of generic transaction, as it
   can easily break serializability/repeatable read rules.

2) Implementing an option for releasing MDL after table has been used in
   case of transactions issued by "mysqldump" is certainly possible. 
   But such a step won't solve problem with concurrent
   "mysqldump --single-transaction" and DDL at all. Even with this
   option used one still can get errors/broken backups in this scenario.
   So this scenario has to be kept discouraged/documented as not
   working/not supported in general case.

3) The ideal solution of the problem would be changing server code to
   take snapshot of data-dictionary at the start of transaction issued
   by mysqldump and using this snapshot for table access during the
   whole duration of transaction. This solution guarantees backup
   correctness and consistency on one hand and allows concurrent DDL on
   the other hand.
   Unfortunately, we are pretty far from implementing this solution, as
   it requires such a major changes as data-dictionary support MVCC and
   table caches/DDL code which support multiple versions of tables being
   active.

4) The more realistic/achievable in the middle term solution to the
   problem of "mysqldump --single-transaction + concurrent DDL" is to
   acquire shared metadata locks on all tables to be dumped at the same
   time as we create consistency point for transaction (i.e. around
   "START TRANSACTION WITH CONSISTENT SNAPSHOT" bug) and keep these
   locks for each table until we have dumped it. This should provide
   correct backups, but will keep concurrent DDL blocked for each table
   for half of mysqldump run time on average.

I will report separate bug report for Documentation issue I have
mentioned earlier.

Implementing items 3) or 4) can be treated as valid feature requests
against mysqldump and MySQL server, in my opinion. So I am verifying
this report as a feature request.
[27 Nov 2013 12:39] Dmitry Lenev
http://bugs.mysql.com/bug.php?id=71025 is documentation issue that I have created.
[27 Nov 2013 15:34] Mark Callaghan
Dmitry,

It is unrealistic to suggest that we avoid DDL when there are long running SELECT statements. And note that long-running SELECTs come from logical backup and ETL. What this becomes is a request for many MySQL deployments to either:
1) Don't do ETL or logical backup
or
2) Don't do DDL

Can we get a solution rather than a reference to documentation that justifies the less than stellar behavior?
[27 Nov 2013 16:59] Eric Bergen
I used optimize table in the example because it was easier to create the bug report. This applies for all kinds of ddl statements that don't rebuild the table. The actual ddl that spawned the investigation for this bug report was drop trigger. 

It also applies for long running queries but is especially bad for mysqldump because it won't revisit a table and is a more common workload that people understand. The way the locking works now you can basically choose between running DDL or a long select query. 

As for the solutions:
1) I'm fine with this being an option we can set when establishing the snapshot. That way it can be set for mysqldump and applications that use similar behavior.

2) Sure this will make backups error when a table is modified that mysqldump hasn't dumped yet. It is perfectly fine for a table that mysqldump has already modified because it will never revisit the table. Having to restart a backup is preferable to having a server fall over under a query pileup where every connection is waiting for table. 

3) This one sounds nice. Having a future where attempting to drop a trigger didn't DoS the server would be nice. 

4) I'm not sure this prevents the worst case query pileup scenario because won't an incoming ddl statement still attempt to create an exclusive mdl which causes other queries to pile up behind it?
[27 Nov 2013 18:11] Dmitry Lenev
Hi Mark!

I am not trying to say that there is no issue and current behaviour is perfect.

What I have tried to say that:

- Issue with "mysqldump --single-transaction" + concurrent DDL existed before 
  MDL, maybe in slightly different form.
- I don't see any quick fix for this issue which is acceptable for generic use.

BTW long-running SELECT and concurrent DDL is a bit different problem.

It is really either one or another at the moment.

This problem exists even in 5.1 and there won't be any nice solution to it until we implement data-dictionary and table caches with multi-versioning support and/or support for fully non-blocking ALTER TABLE.

I guess some people would prefer DDL to abort conflicting long-running SELECTs (or transactions in general case) instead of waiting for them. This is a valid feature request as well.

Hi Eric!

OK I see your point about preferring aborted mysqldump attempts to query pile-up.

Unfortunately, it is not that simple with 2).

Let us assume that we simply changed current mysqldump code to release locks right after each table is dumped in --single-transaction mode.

We also accept that from time to time our dump attempts will fail with ER_TABLE_DEF_CHANGED/ER_NO_SUCH_TABLE errors due to concurrent DDL.

Still this leaves at least two issues which might cause even successful finished dump to be inconsistent:

I) Select from I_S which is used to construct list of tables to be dumped
   doesn't have the same view on data-dictionary as consistent snapshot
   transaction. So there is a chance that due to concurrent DROP TABLE 
   statements this list won't include some tables which should be visible to
   transaction and thus should be present in dump. 
   This issue can be solved by selecting from I_S before releasing FLUSH TABLES
   WITH READ LOCK lock acquired by mysqldump in order to create validity point
   with binary log. I am not sure that there is an easy way to solve this
   problem in absence of FLUSH TABLES WITH READ LOCK.

   Could you please clarify how case when FLUSH TABLES WITH READ LOCK is not
   used with --single-transaction important to you?

II) At the moment concurrent RENAME TABLE won't cause ER_TABLE_DEF_CHANGED
    in some cases but will lead to incorrect (inconsistent) data being returned.
    My colleagues from InnoDB team should clarify if it is easy to fix this 
    issue.

Regarding 4). Yes it won't prevent pile-up if there are concurrent DDL, but for some people this might be acceptable trade-off to failing mysqldump.
[27 Nov 2013 18:21] Harrison Fisk
I did a test based on Davi's comment but it doesn't seem to work:

> For what it's worth, metadata locks are released on rollback to savepoint. So, I think, if you grab a  savepoint before accessing each table (and rolling back afterwards), you would have something that resembles --single-transaction but without transaction-wide metadata locks.

I did:

conn1:
BEGIN;
SAVEPOINT mysqldump;
SELECT * FROM tbl;
ROLLBACK TO SAVEPOINT mysqldump;

conn2:
OPTIMIZE TABLE tbl;
-- this hung
-- so conn1 still holds the MDL even after doing the rollback to the savepoint
[27 Nov 2013 19:32] Dmitry Lenev
Hi Harrison!

Try disabling binary logging for in connection in which you execute ROLLBACK TO SAVEPOINT (for duration of this specific transaction). AFAIR this should allow ROLLBACK TO SAVEPOINT to release metadata locks.
[27 Nov 2013 19:48] Davi Arnaut
You also need something to hold the read view (internal snapshot) open, which can be through a table or "WITH CONSISTENT SNAPSHOT".

Example: https://gist.github.com/darnaut/7682045
[27 Nov 2013 22:15] Justin Swanhart
Would LOW_PRIODITY_MDL be possible?  This would be similar to LOW_PRIORITY_UPDATES, in that MDL locks would block writes but not reads.  As soon as the MDL lock is released the write locks proceed (and any read locks waiting on write locks).
[28 Nov 2013 3:51] Harrison Fisk
I wrote a diff that wraps each table dump in SAVEPOINT/ROLLBACKs, and sets sql_log_bin=0 on the connection.  From my early testing, this seems to work well enough and gets us back to the same MySQL 5.1 concurrency allowances.
[6 Dec 2013 9:22] Dmitry Lenev
Hello All!

After internal discussion we have decided that we need to change
mysqldump code to release metadata locks after dumping tables if
--single-transaction option is used.

As mentioned above this is not sufficient for making  "mysqldump
--single-transaction" completely safe in presence of concurrent DDL
because of:

a) The fact that in presence of concurrent DDL "mysqldump
   --single-transaction" still might abort due to ER_NO_SUCH_TABLE
   or ER_TABLE_DEF_CHANGE errors (but as far as I understand this
   is acceptable trade-off for concurrency increase for some people).

b) Two issues marked as I) and II) described above.

But at least it should bring us back to the 5.1 concurrency situation.

So I am changing this report from "feature request" back to "normal bug".
[10 Dec 2013 12:51] Marko Mäkelä
Generally, concurrency problems can be solved by locking or by multi-versioning.
If we had a multi-versioned data dictionary, each transaction would see the table definitions as they were when their read view was created. That would (among other things) mean that the likes of DROP INDEX, DROP TABLE, TRUNCATE TABLE or a table-rebuilding ALTER TABLE would be unable to free the space immediately. A purge-like operation would collect the garbage when all read views that could see the old version of the table definition have been closed.

I wonder if any database system successfully followed this approach.
[10 Dec 2013 13:04] Peter Laursen
In MySQL tables are basically *files* I think. This was not a problem with MyISAM as it was a deliberate design at the time and MyISAM did not support transactions - but even with InnoDB (with or without innodb_file_per_table) it seems still the case as far as I can understand.

And isn't http://bugs.mysql.com/bug.php?id=71110 an 'incarnation' of the same problem, basically?

Peter
(not a MySQL/Oracle person)
[23 Dec 2013 11:47] Dmitry Lenev
Hello All!

Please note that I have reported problem with concurrent RENAME TABLE mentioned above as bug #71214 "WITH CONSISTENT SNAPSHOT doesn't isolate from concurrent RENAME TABLE".

Also I have reported two other cases in which concurrent DDL can cause problems for consistent snapshot transactions/mysqldump --single-transaction.
These are bug #71215 "WITH CONSISTENT SNAPSHOT doesn't isolate from some concurrent ALTER TABLEs" and bug #71216 "WITH CONSISTENT SNAPSHOT doesn't isolate from concurrent DDL on stored programs".
[14 Jan 2014 18:33] Paul DuBois
Noted in 5.6.16, 5.7.4 changelogs.

mysqldump --single-transaction acquired metadata locks for each
dumped table but did not release them until the dump operation
finished. Consequently, other DDL operations on a dumped table
blocked even after the table itself had been dumped. mysqldump now
attempts to release metadata locks earlier.
[3 Feb 2014 11:57] Laurynas Biveinis
5.6$ bzr log -r 5716
------------------------------------------------------------
revno: 5716
committer: Praveenkumar Hulakund <praveenkumar.hulakund@oracle.com>
branch nick: mysql_5_6
timestamp: Sat 2013-12-28 22:08:40 +0530
message:
  Bug#17862905: MYSQLDUMP CREATES USELESS METADATA LOCKS
  
  Problem Description:
  --------------------
  While taking the backup using tool "mysqldump" with option
  "--single-transaction", MDL lock is acquired on each table dumped.
  But these locks are not released until the backup operation is
  completed. Any concurrent DDL operation on those tables will be
  blocked until backup operation is completed. Moreover such blocked
  DDL operations will also block other concurrent DML operations
  (Since DDL has priority over DML) creating pile-up.
  
  Note that officially mysqldump --single-transaction is documented as
  not working/not supported in presence of concurrent DDL statements.
  But it might work for some people in some scenarios and before 5.5,
  due to absence of MDL, combination of "mysqldump --single-transaction"
  and concurrent DDL didn't create pile-up of DML statements.
  
  Analysis:
  --------------------
  "mysqldump" start transaction with consistent snapshot and sets
  isolation level to "repeatable read" when it is used with option
  "--single-transaction".  Data of all the tables is dumped using
  SELECT statement as part of this transaction. MDL lock SR is
  taken on all these tables and held till the transaction is 
  committed or rolled back. Any other incompatible MDL lock request
  on these tables will wait until timeout or "mysqldump" operation
  is completed.
  
  As result concurrent DDL operations on the tables which were dumped
  will be blocked till the end of dump or timeout.
  
  Note that once table is dumped it won't be used again by "mysqldump".
  This fact and the fact that "mysqldump --single-transactions" produces
  backup with validity point at the start of transaction (and also
  retrieves binlog position for backup at the start of transaction) means
  that "mysqldump --single-transaction" can release MDL on tables which
  it has already dumped without introducing more problems with
  consistency.
  
  Fix:
  --------------------
  To make "mysqldump --single-transaction" to release locks once
  dumping of the table is done, modified mysqldump client code
  to dump table data after setting a savepoint. Once dumping of
  table data is over, added code to rollback to the savepoint set.
  Rolling back to savepoint will release MDL lock acquired for the
  table.
  
  But as of now, on rollback to savepoint, MDL locks are
  not released if binlog is on. This logic is added to avoid 
  dropping of tables before rollback to savepoint event is 
  written to binlog. But when binlog handlerton can clear cache
  and can safely rollback to savepoint without writing an event for
  rollback to savepoint then also we are not releasing the MDL
  locks. 
  
  This is fixed by introducing a new handlerton function call
  savepoint_rollback_can_release_mdl. We call this function to
  check with each storage engine participating in transaction
  whether it is safe to release MDL after rollback to savepoint.
  Metadata locks are released only if all the storage engines
  agreed that it is a safe thing to do.
  
  1) For InnoDB storage engine this handlerton function can allow
     release of MDL locks if transaction has not acquired any InnoDB
     locks.
  
  2) For Binlog this handlerton function can allow release of MDL
     locks if rollback to savepoint will completely remove any
     traces of transaction from cache.
  
  3) Absence of this method for any storage engine means it is not
     safe to release MDL locks.
  
  Note that this patch doesn't make "mysqldump --single-transaction"
  safe in general case in presence of concurrent DDL. Nor makes it
  officially supported in this case. It just allows to avoid problem
  with unnecessary concurrent DDL blocking and associated DML query 
  pile-up in some specific cases when it might work.