MySQL Bugs: #71017: mysqldump creates useless metadata locks

Bug #71017	mysqldump creates useless metadata locks
Submitted:	26 Nov 2013 18:29	Modified:	14 Jan 2014 18:33
Reporter:	Eric Bergen (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: DDL	Severity:	S3 (Non-critical)
Version:	5.5	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	mysqldump

Description:
When running mysqldump --single-transaction metadata locks are acquired as mysqldump reaches each table. These locks are not feed until mysqldump has finished dumping all of the tables. Any other ddl operation attempted on a table which mysqldump has finished dumping will block all queries from accessing that table until the original mdl is freed. 

This is incredibly disruptive for a dump mode that is not supposed to create long lived locks. The locks in this case are also especially useless because mysqldump will never need to revisit a table it has finished dumping.

How to repeat:
I will emulate mysqldump with client connection 1 here:
con1> begin;
Query OK, 0 rows affected (0.00 sec)

con1> select * from t;
+----+---------+
| t  | foo     |
+----+---------+
|  1 | bar     |
|  2 | bar     |
|  3 | bar     |
|  4 | bar     |
|  5 | bar     |
|  6 | bar     |
|  7 | NULL    |
|  8 | NULL    |
|  9 | NULL    |
| 10 | NULL    |
| 11 | NULL    |
| 12 | NULL    |
| 13 | NULL    |
| 14 | NULL    |
| 15 | bar     |
| 16 | asdfasd |
| 17 | asdfasd |
| 18 | asdfasd |
| 19 | asdfasd |
| 20 | asdfasd |
| 21 | asdfasd |
| 22 | 3hasde  |
+----+---------+
22 rows in set (0.00 sec)
* leave transaction open *

con2> optimize table t;
*hang*

con3> select * from t;
*hang*

In the old world the select on connection3 would succeed. In the new world of mdl it hangs until mysqldump is finished

Suggested fix:
Provide a way to start a transaction that does not create mdls. Possibly another option to start transaction.

REGRESSION! REGRESSION!

Seems like the problem is that MySQL does not yet have a crystal ball feature. This way it could unlock tables that it knows won't be used again.

Why do we need to hold the MDL for read-only statements?

> Why do we need to hold the MDL for read-only statements?

To prevent table structure changes while the statement is running, but this has always been the case.

For read-only transactions the locks could be released at the end of each statement if gutting the transaction is not an issue.

For what it's worth, metadata locks are released on rollback to savepoint. So, I think, if you grab a savepoint before accessing each table (and rolling back afterwards), you would have something that resembles --single-transaction but without transaction-wide metadata locks.

Davi, that is why I wrote in http://dom.as/2013/11/26/how-mysql-engineering-broke-the-backups/ about production implications. Yeah, sure, crystal ball, blah blah, but in this case, major functionality gets broken, and when such a change gets introduced, one needs to think how major functionality will be affected by that. 

Will large scans be accessing tables again? Probably not. Do we have to add all those /*!55../ clauses around? Maybe. Should we allow to express the "do not lock for the transaction duration"? Absolutely, as it is regression to end up with the behavior like now.

It seems to me that the thing is that you guys are 4 years late to the party.

I've known about this bug for years I just never bothered to report it because my history of having oracle fix bugs is not great. I filed a more serious bug that results in empty tables in backups that wasn't even fixed:
http://bugs.mysql.com/bug.php?id=28432

As far as I understand, you are hitting a priority inversion bug with meta-data locks (MDL).

As noted, the meta-data locks were introduced in MySQL 5.5. InnoDB relies on them for non-locking reads (MVCC). Anything else is also protected by InnoDB internal table locks.

Bug#67286 describes another scenario, which is involving online ALTER TABLE, which can acquire an exclusive MDL at two phases: at the start (prepare) and at the very end (commit).

Basically, you have a long-running transaction that is holding a shared meta-data lock on the table. This shared lock is preventing any DDL or table definition changes (such as OPTIMIZE TABLE or ALTER TABLE).

The problem is: While the DDL thread is trying to acquire an exclusive MDL on the table, it is also preventing any further threads from acquiring an MDL on the table. So, any other DML on the table will be blocked. Basically, we have 2 connections "conspiring" to deny the service from other connections.

This problem could be worked around or fixed in two ways:

(1) As suggested, release the MDL earlier. This could be a special statement issued by mysqldump. (We do not want normal multi-statement DML operations to be hit by this. They should acquire each MDL only once per transaction.)

(2) Change the behaviour of the MDL upgrade, possibly by a configuration option. If we allow further non-conflicting MDL acquisition during the exclusive MDL request, the DDL thread could starve, which could be frustrating. Perhaps we should not allow such starvation at the end of ALTER TABLE.

Eric, I am sorry for the false claim that was made in 2010 on Bug#28432.
Do you think that the bug is still not fixed in MySQL 5.6?

It should also work with CREATE TABLE. The old transaction should refuse to
read a table that was created after the transaction started.

It seems like read-only transactions shouldn't need to hold the MDL for the entire duration.  So enhancing START TRANSACTION READ ONLY to not hold MDL and then having mysqldump or mydumper use that command seems like a good solution.

Like Marko said, the issue is that shared locks are not granted if an exclusive lock is pending. Another option is to have low-priority exclusive locks that do not prevent shared locks from being granted.

Harrison, a downside of doing that for read-only transactions is that the transaction would not be properly isolated even in repeatable-read.

Hello Eric, All!

First of all I would like to point out that running "mysqldump
--single-transaction" and any DDL concurrently is a bad idea
and doesn't work in general case. This is clearly documented at:

http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_single-transaction

===
While a --single-transaction dump is in process, to ensure a valid dump
file (correct table contents and binary log coordinates), no other
connection should use the following statements: ALTER TABLE, CREATE TABLE,
DROP TABLE, RENAME TABLE, TRUNCATE TABLE. A consistent read is not isolated
from those statements, so use of them on a table to be dumped can cause the
SELECT that is performed by mysqldump to retrieve the table contents to
obtain incorrect contents or fail.
===

Unfortunately, this is true not only for 5.1, but for 5.5 and 5.6 as well.
The main difference is that in 5.6 mysqldump is more likely to fail with
error instead of dumping incorrect contents.

Still this is a valid documentation bug which needs to be fixed.

Taking into account the above if "mysqldump --single-transaction" manages to
work with concurrent DDL in 5.1/5.5/5.6 it is either mostly due to pure luck 
or due to an incredible skill in selecting time for concurrent DDL to run and/
or significant efforts spent on backup validation after it was dumped (but
this are not officially supported or recommended scenarios anyway).

Now one needs to realize that OPTIMIZE TABLE which you give as an example
for InnoDB table is an alias for ALTER TABLE. This is also documented at:

http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html

==
For InnoDB tables, OPTIMIZE TABLE is mapped to ALTER TABLE, which rebuilds
the table to update index statistics and free unused space in the clustered
index.
==

From the above it is clear that we are discussing change in behavior in
scenario which is explicitly discouraged and documented as broken in 5.1
and should be documented as discouraged and broken all the way up to 5.6.

In my opinion, it sounds like a stretch to call this change a regression.

Now let us take a look at possible ways to alleviate this issue:

1) As Davi Arnaut has correctly pointed out, releasing MDL after table
   has been used is not an option in case of generic transaction, as it
   can easily break serializability/repeatable read rules.

2) Implementing an option for releasing MDL after table has been used in
   case of transactions issued by "mysqldump" is certainly possible. 
   But such a step won't solve problem with concurrent
   "mysqldump --single-transaction" and DDL at all. Even with this
   option used one still can get errors/broken backups in this scenario.
   So this scenario has to be kept discouraged/documented as not
   working/not supported in general case.

3) The ideal solution of the problem would be changing server code to
   take snapshot of data-dictionary at the start of transaction issued
   by mysqldump and using this snapshot for table access during the
   whole duration of transaction. This solution guarantees backup
   correctness and consistency on one hand and allows concurrent DDL on
   the other hand.
   Unfortunately, we are pretty far from implementing this solution, as
   it requires such a major changes as data-dictionary support MVCC and
   table caches/DDL code which support multiple versions of tables being
   active.

4) The more realistic/achievable in the middle term solution to the
   problem of "mysqldump --single-transaction + concurrent DDL" is to
   acquire shared metadata locks on all tables to be dumped at the same
   time as we create consistency point for transaction (i.e. around
   "START TRANSACTION WITH CONSISTENT SNAPSHOT" bug) and keep these
   locks for each table until we have dumped it. This should provide
   correct backups, but will keep concurrent DDL blocked for each table
   for half of mysqldump run time on average.

I will report separate bug report for Documentation issue I have
mentioned earlier.

Implementing items 3) or 4) can be treated as valid feature requests
against mysqldump and MySQL server, in my opinion. So I am verifying
this report as a feature request.

http://bugs.mysql.com/bug.php?id=71025 is documentation issue that I have created.

Dmitry,

It is unrealistic to suggest that we avoid DDL when there are long running SELECT statements. And note that long-running SELECTs come from logical backup and ETL. What this becomes is a request for many MySQL deployments to either:
1) Don't do ETL or logical backup
or
2) Don't do DDL

Can we get a solution rather than a reference to documentation that justifies the less than stellar behavior?

I used optimize table in the example because it was easier to create the bug report. This applies for all kinds of ddl statements that don't rebuild the table. The actual ddl that spawned the investigation for this bug report was drop trigger. 

It also applies for long running queries but is especially bad for mysqldump because it won't revisit a table and is a more common workload that people understand. The way the locking works now you can basically choose between running DDL or a long select query. 

As for the solutions:
1) I'm fine with this being an option we can set when establishing the snapshot. That way it can be set for mysqldump and applications that use similar behavior.

2) Sure this will make backups error when a table is modified that mysqldump hasn't dumped yet. It is perfectly fine for a table that mysqldump has already modified because it will never revisit the table. Having to restart a backup is preferable to having a server fall over under a query pileup where every connection is waiting for table. 

3) This one sounds nice. Having a future where attempting to drop a trigger didn't DoS the server would be nice. 

4) I'm not sure this prevents the worst case query pileup scenario because won't an incoming ddl statement still attempt to create an exclusive mdl which causes other queries to pile up behind it?

Hi Mark!

I am not trying to say that there is no issue and current behaviour is perfect.

What I have tried to say that:

- Issue with "mysqldump --single-transaction" + concurrent DDL existed before
MDL, maybe in slightly different form.
- I don't see any quick fix for this issue which is acceptable for generic use.

BTW long-running SELECT and concurrent DDL is a bit different problem.

It is really either one or another at the moment.

This problem exists even in 5.1 and there won't be any nice solution to it until we implement data-dictionary and table caches with multi-versioning support and/or support for fully non-blocking ALTER TABLE.

I guess some people would prefer DDL to abort conflicting long-running SELECTs (or transactions in general case) instead of waiting for them. This is a valid feature request as well.

Hi Eric!

OK I see your point about preferring aborted mysqldump attempts to query pile-up.

Unfortunately, it is not that simple with 2).

Let us assume that we simply changed current mysqldump code to release locks right after each table is dumped in --single-transaction mode.

We also accept that from time to time our dump attempts will fail with ER_TABLE_DEF_CHANGED/ER_NO_SUCH_TABLE errors due to concurrent DDL.

Still this leaves at least two issues which might cause even successful finished dump to be inconsistent:

I) Select from I_S which is used to construct list of tables to be dumped
doesn't have the same view on data-dictionary as consistent snapshot
transaction. So there is a chance that due to concurrent DROP TABLE
statements this list won't include some tables which should be visible to
transaction and thus should be present in dump.
This issue can be solved by selecting from I_S before releasing FLUSH TABLES
WITH READ LOCK lock acquired by mysqldump in order to create validity point
with binary log. I am not sure that there is an easy way to solve this
problem in absence of FLUSH TABLES WITH READ LOCK.

Could you please clarify how case when FLUSH TABLES WITH READ LOCK is not
used with --single-transaction important to you?

II) At the moment concurrent RENAME TABLE won't cause ER_TABLE_DEF_CHANGED
in some cases but will lead to incorrect (inconsistent) data being returned.
My colleagues from InnoDB team should clarify if it is easy to fix this
issue.

Regarding 4). Yes it won't prevent pile-up if there are concurrent DDL, but for some people this might be acceptable trade-off to failing mysqldump.

I did a test based on Davi's comment but it doesn't seem to work:

> For what it's worth, metadata locks are released on rollback to savepoint. So, I think, if you grab a  savepoint before accessing each table (and rolling back afterwards), you would have something that resembles --single-transaction but without transaction-wide metadata locks.

I did:

conn1:
BEGIN;
SAVEPOINT mysqldump;
SELECT * FROM tbl;
ROLLBACK TO SAVEPOINT mysqldump;

conn2:
OPTIMIZE TABLE tbl;
-- this hung
-- so conn1 still holds the MDL even after doing the rollback to the savepoint

Hi Harrison!

Try disabling binary logging for in connection in which you execute ROLLBACK TO SAVEPOINT (for duration of this specific transaction). AFAIR this should allow ROLLBACK TO SAVEPOINT to release metadata locks.

You also need something to hold the read view (internal snapshot) open, which can be through a table or "WITH CONSISTENT SNAPSHOT".

Example: https://gist.github.com/darnaut/7682045

Would LOW_PRIODITY_MDL be possible?  This would be similar to LOW_PRIORITY_UPDATES, in that MDL locks would block writes but not reads.  As soon as the MDL lock is released the write locks proceed (and any read locks waiting on write locks).

I wrote a diff that wraps each table dump in SAVEPOINT/ROLLBACKs, and sets sql_log_bin=0 on the connection.  From my early testing, this seems to work well enough and gets us back to the same MySQL 5.1 concurrency allowances.

Hello All!

After internal discussion we have decided that we need to change
mysqldump code to release metadata locks after dumping tables if
--single-transaction option is used.

As mentioned above this is not sufficient for making  "mysqldump
--single-transaction" completely safe in presence of concurrent DDL
because of:

a) The fact that in presence of concurrent DDL "mysqldump
   --single-transaction" still might abort due to ER_NO_SUCH_TABLE
   or ER_TABLE_DEF_CHANGE errors (but as far as I understand this
   is acceptable trade-off for concurrency increase for some people).

b) Two issues marked as I) and II) described above.

But at least it should bring us back to the 5.1 concurrency situation.

So I am changing this report from "feature request" back to "normal bug".

Generally, concurrency problems can be solved by locking or by multi-versioning.
If we had a multi-versioned data dictionary, each transaction would see the table definitions as they were when their read view was created. That would (among other things) mean that the likes of DROP INDEX, DROP TABLE, TRUNCATE TABLE or a table-rebuilding ALTER TABLE would be unable to free the space immediately. A purge-like operation would collect the garbage when all read views that could see the old version of the table definition have been closed.

I wonder if any database system successfully followed this approach.

In MySQL tables are basically *files* I think. This was not a problem with MyISAM as it was a deliberate design at the time and MyISAM did not support transactions - but even with InnoDB (with or without innodb_file_per_table) it seems still the case as far as I can understand.

And isn't http://bugs.mysql.com/bug.php?id=71110 an 'incarnation' of the same problem, basically?

Peter
(not a MySQL/Oracle person)

Hello All!

Please note that I have reported problem with concurrent RENAME TABLE mentioned above as bug #71214 "WITH CONSISTENT SNAPSHOT doesn't isolate from concurrent RENAME TABLE".

Also I have reported two other cases in which concurrent DDL can cause problems for consistent snapshot transactions/mysqldump --single-transaction.
These are bug #71215 "WITH CONSISTENT SNAPSHOT doesn't isolate from some concurrent ALTER TABLEs" and bug #71216 "WITH CONSISTENT SNAPSHOT doesn't isolate from concurrent DDL on stored programs".

Noted in 5.6.16, 5.7.4 changelogs.

mysqldump --single-transaction acquired metadata locks for each
dumped table but did not release them until the dump operation
finished. Consequently, other DDL operations on a dumped table
blocked even after the table itself had been dumped. mysqldump now
attempts to release metadata locks earlier.

5.6$ bzr log -r 5716
------------------------------------------------------------
revno: 5716
committer: Praveenkumar Hulakund <praveenkumar.hulakund@oracle.com>
branch nick: mysql_5_6
timestamp: Sat 2013-12-28 22:08:40 +0530
message:
  Bug#17862905: MYSQLDUMP CREATES USELESS METADATA LOCKS
  
  Problem Description:
  --------------------
  While taking the backup using tool "mysqldump" with option
  "--single-transaction", MDL lock is acquired on each table dumped.
  But these locks are not released until the backup operation is
  completed. Any concurrent DDL operation on those tables will be
  blocked until backup operation is completed. Moreover such blocked
  DDL operations will also block other concurrent DML operations
  (Since DDL has priority over DML) creating pile-up.
  
  Note that officially mysqldump --single-transaction is documented as
  not working/not supported in presence of concurrent DDL statements.
  But it might work for some people in some scenarios and before 5.5,
  due to absence of MDL, combination of "mysqldump --single-transaction"
  and concurrent DDL didn't create pile-up of DML statements.
  
  Analysis:
  --------------------
  "mysqldump" start transaction with consistent snapshot and sets
  isolation level to "repeatable read" when it is used with option
  "--single-transaction".  Data of all the tables is dumped using
  SELECT statement as part of this transaction. MDL lock SR is
  taken on all these tables and held till the transaction is 
  committed or rolled back. Any other incompatible MDL lock request
  on these tables will wait until timeout or "mysqldump" operation
  is completed.
  
  As result concurrent DDL operations on the tables which were dumped
  will be blocked till the end of dump or timeout.
  
  Note that once table is dumped it won't be used again by "mysqldump".
  This fact and the fact that "mysqldump --single-transactions" produces
  backup with validity point at the start of transaction (and also
  retrieves binlog position for backup at the start of transaction) means
  that "mysqldump --single-transaction" can release MDL on tables which
  it has already dumped without introducing more problems with
  consistency.
  
  Fix:
  --------------------
  To make "mysqldump --single-transaction" to release locks once
  dumping of the table is done, modified mysqldump client code
  to dump table data after setting a savepoint. Once dumping of
  table data is over, added code to rollback to the savepoint set.
  Rolling back to savepoint will release MDL lock acquired for the
  table.
  
  But as of now, on rollback to savepoint, MDL locks are
  not released if binlog is on. This logic is added to avoid 
  dropping of tables before rollback to savepoint event is 
  written to binlog. But when binlog handlerton can clear cache
  and can safely rollback to savepoint without writing an event for
  rollback to savepoint then also we are not releasing the MDL
  locks. 
  
  This is fixed by introducing a new handlerton function call
  savepoint_rollback_can_release_mdl. We call this function to
  check with each storage engine participating in transaction
  whether it is safe to release MDL after rollback to savepoint.
  Metadata locks are released only if all the storage engines
  agreed that it is a safe thing to do.
  
  1) For InnoDB storage engine this handlerton function can allow
     release of MDL locks if transaction has not acquired any InnoDB
     locks.
  
  2) For Binlog this handlerton function can allow release of MDL
     locks if rollback to savepoint will completely remove any
     traces of transaction from cache.
  
  3) Absence of this method for any storage engine means it is not
     safe to release MDL locks.
  
  Note that this patch doesn't make "mysqldump --single-transaction"
  safe in general case in presence of concurrent DDL. Nor makes it
  officially supported in this case. It just allows to avoid problem
  with unnecessary concurrent DDL blocking and associated DML query 
  pile-up in some specific cases when it might work.