Bug #116289 MySQL Online DDLs Cause Metadata Deadlocks on Active Transactions
Submitted: 3 Oct 2024 12:06 Modified: 3 Oct 2024 16:04
Reporter: Daniel Valchev Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: DDL Severity:S5 (Performance)
Version:8.0.36, 9.0.1 OS:Any
Assigned to: CPU Architecture:Any
Tags: DDL, deadlock, metadata deadlock, Online DDL

[3 Oct 2024 12:06] Daniel Valchev
Description:
During online DDL operations (such as adding a column), deadlocks can occur when a running transaction holds a metadata lock on a table and a schema change operation requests an exclusive metadata lock on the same table. These deadlocks are not acceptable for systems that require high availability during schema migrations, as active transactions may be aborted, causing application disruptions.

These deadlocks are not logged in the MySQL error log because they are related to metadata and not InnoDB transaction-level deadlocks.

Impact
- Online DDL operations can cause downtime or transaction aborts if active transactions are using the affected tables. Active transactions may be aborted unexpectedly during DDL operations, causing disruptions to users and loss of transaction progress.
- Since metadata deadlocks are not logged, they are harder to diagnose, potentially leading to unexpected application behavior.

How to repeat:
# Scenario 1: Simple Table with DDL and Transaction

CREATE TABLE `test_table` (
  `id` int,
  `col_a` int
) ENGINE=InnoDB;

INSERT INTO `test_table` (`id`, `col_a`) VALUES (1, 1);

-- Session 1
-- acquires SHARED_READ metadata lock on test_table
START TRANSACTION;

SELECT * FROM test_table WHERE id = 1;

-- Session 2
-- waits for EXCLUSIVE metadata lock on test_table
ALTER TABLE `test_table` ADD `new_column` VARCHAR(255) NULL;

-- Session 1
-- tries to acquire EXCLUSIVE metadata lock on test_table, resulting in deadlock
UPDATE `test_table` SET `test_table`.`col_a` = 2 where `id` = 1;

=> ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

# Scenario 2: Foreign Key Relationship with DDL and Transaction

CREATE TABLE `test_table` (
  `id` int PRIMARY KEY
) ENGINE=InnoDB;

CREATE TABLE `test_table_b` (
  `id` int,
  `fk_id` int,
  `col_a` int,
  FOREIGN KEY (`fk_id`) REFERENCES test_table(`id`)
) ENGINE=InnoDB;

INSERT INTO `test_table` (`id`) VALUES (1);
INSERT INTO `test_table_b` (`id`, `col_a`, `fk_id`) VALUES (1, 1, 1);

-- Session 1
-- acquires SHARED_READ metadata lock on test_table_b
START TRANSACTION;

SELECT * FROM test_table_b WHERE id = 1;

-- Session 2
-- waits for EXCLUSIVE metadata lock on test_table_b because of foreign key
ALTER TABLE `test_table` ADD `new_column` VARCHAR(255) NULL;

-- Session 1
-- tries to acquire EXCLUSIVE metadata lock on test_table_b, resulting in deadlock
UPDATE `test_table_b` SET `test_table_b`.`col_a` = 2 where `id` = 1;

=> ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
[3 Oct 2024 12:38] MySQL Verification Team
Hi Mr. Valchev,

Thank you for your bug report.

What you are reporting is expected behaviour.

Let us inform your why have we introduced Metadata locks. Before MDL locks  were introduced, because DDL operations on InnoDB caused very severe corruptions from the concurrent DML operations.

There was a very large number of bugs because DDL operations were not protected, so that DML operations might cause crashes or very heavy corruption of data.

Since we introduced MDLs, all those problems were solved.

So, if you wish that we abolish MDLs , you should design a system that will prevent DMLs from corrupting InnoDB tables.

We have also designed and implemented a number of different locks that enable parallel execution of DDLs and DMLs, but all those have certain restrictions, because restrictions are necessary for the safety of your data.

You can read this entire chapter to learn more:

https://dev.mysql.com/doc/refman/9.0/en/innodb-online-ddl.html

Not a bug.
[3 Oct 2024 16:04] Daniel Valchev
Hi!

Thank you for your explanation regarding the purpose of MDLs and their role in preventing data corruption. I appreciate the steps taken to ensure data integrity. However, the issue at hand isn’t about the existence of MDLs but about their current implementation leading to application disruptions.

I have reviewed the documentation at this link, but I did not find specific references or details that describe deadlocks occurring during online DDL operations. Could you please point me to the section that explains this behavior in more detail?

While I understand that Online DDL has its limitations, the issue here is that there appears to be no alternative to safely alter the database schema without aborting active transactions. The need to abort transactions is disruptive, and in cases where schema changes are required (such as during migrations or upgrades), it becomes challenging to maintain service reliability.

One possible solution, I imagine, could be to prioritize granting locks to DML operations in active transactions, allowing these transactions to finish before granting the exclusive lock required by DDL operations. This approach would let DDL statements wait for active transactions to complete, reducing the chances of transaction aborts and avoiding service disruptions. 

Can you suggest any alternative approaches using MySQL to alter the schema so that it does not lead to aborted transactions?
[4 Oct 2024 9:20] MySQL Verification Team
Hi Mr. Valchev,

Thank you for your e-mail ......

Unfortunately, we have tried various alternatives , but the current one is the only one that truly works and protects DDL operations .......

Regarding the various locking methods for DDL's, this is one of the chapters on the subject:

https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html

Also, it is supposed that a reporter reads all relevant sections of our Reference Manual, before a bug report is filed.

Not a bug.