| Bug #119451 | The schema's MDL lock is held for too long after opening the table | ||
|---|---|---|---|
| Submitted: | 24 Nov 9:50 | Modified: | 24 Nov 10:47 |
| Reporter: | Xizhe Zhang (OCA) | Email Updates: | |
| Status: | Open | Impact on me: | |
| Category: | MySQL Server: Locking | Severity: | S2 (Serious) |
| Version: | 8.0.44 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[24 Nov 9:54]
Xizhe Zhang
Hello, Verification Team, this is my patch. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: fix_bug_119451.diff (application/octet-stream, text), 6.86 KiB.
[24 Nov 10:01]
Xizhe Zhang
I forgot to remove some irrelevant content from the diff file I just uploaded. Please use this submission. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: fix_bug_119451.diff (application/octet-stream, text), 6.21 KiB.
[24 Nov 10:47]
Xizhe Zhang
Here I will explain the modifications I made to the test cases in my bugfix.
- `mysql-test/t/drop.test`: There is no need to wait for SCHEMA's MDL Lock. This is a revert to bug #24510948.
- `mysql-test/suite/perfschema/r/mdl_func.result`: There is no need to hold SCHEMA's MDL Lock after opening table. This is a revert to bug #24510948.
- `mysql-test/t/schema.test`: There are two modifications:
- There is no need to wait for SCHEMA's MDL Lock. This is a revert to bug #24510948.
- In the test case of Bug #24510948, the `INSERT` statement fails due to MDL deadlock detection. I will explain why the deadlock occurred later.
Below is the execution order of Bug#24510948's test case:
1. `INSERT` thread acquires TABLE's MDL SHARED_WRITE LOCK (in `open_table_get_mdl_lock()`)
2. `INSERT` thread acquires SCHEMA's MDL IX Lock, and is about to open the table (in `get_table_share()`)
3. `ALTER TABLE` thread acquires SCHEMA's MDL IX Lock and TABLE's MDL SHARED_UPGRADABLE Lock (in `lock_table_names()`)
4. `DROP DATABASE` thread waits for SCHEMA's MDL X Lock, which block subsequent requests for the SCHEMA's MDL lock
5. `INSERT` thread releases SCHEMA's MDL IX Lock after opening table
6. `ALTER TABLE` thread is wake up, and then it need to wait for upgrading TABLE's MDL Lock (in `mysql_inplace_alter_table()`)
7. `INSERT` thread wait SCHEMA's MDL IX Lock, Dead Lock! (in `Dictionary_client::acquire()`)
Fortunately, MDL locks have deadlock detection, so this won't hang the SCHEMA. The reason is that when DML opens the table, it acquires the TABLE's Lock first, and then acquires the SCHEMA's Lock, regardless of whether it's the first time the table is opened. Even without this bug fix, there's still a possibility of a dead lock occurring.

Description: When a table is first opened, the `get_table_share` function requires holding an MDL lock of level IX on the SCHEMA. For DML statements, this lock will only be released after the transaction ends! If this transaction is never committed, and another session executes a statement like `CREATE/DROP DATABASE`, then all subsequent accesses to this SCHEMA will be blocked! ```cpp TABLE_SHARE *get_table_share(THD *thd, const char *db, const char *table_name, const char *key, size_t key_length, bool open_view, bool open_secondary) { ... for (;;) { auto it = table_def_cache->find(string(key, key_length)); if (it == table_def_cache->end()) { ... if (dd::mdl_lock_schema(thd, db, MDL_TRANSACTION)) { // acquire the lock mysql_mutex_lock(&LOCK_open); return nullptr; } ... } ... } ... } ``` How to repeat: I wrote a test case, and when you run it, you'll find that `conn2` is blocked, waiting for the MDL X lock on SCHEMA: create database if not exists db1; create table db1.t1(a int); --connect(conn1, localhost, root) --echo [connection1] begin; insert into db1.t1 values(2); --connect(conn2, localhost, root) --echo [connection2] --disable_warnings create database if not exists db1; --enable_warnings --connection conn1 --echo [connection1] commit; --disconnect conn1 --disconnect conn2 --connection default drop database db1; Suggested fix: After analyzing the issue, I found that it was introduced by Bug #24510948. Prior to this, the MDL lock on the SCHEMA acquired when opening a table would be released within the function and would not be held for a long time. Bug #24510948 itself solved a deadlock problem with three participants: - `ALTER TABLE` thread: held SCHEMA's MDL IX Lock, waiting for `INSERT` thread to finish opening the table - `DROP DATABASE` thread: waiting for SCHEMA's MDL X Lock, which block subsequent requests for the SCHEMA's MDL lock - `INSERT` thread: waiting for SCHEMA's MDL IX Lock Therefore, Bug #24510948 makes the MDL lock acquired earlier within the `get_table_share` function. The fix itself is fine, but it significantly increases the lock holding time. In my opinion, the lock holding time should remain consistent with before, and the lock should be released before the function returns to avoid "hang" the entire SCHEMA.