Bug #107574 | MTR deadlocks when preserving commit order and changing read_only. | ||
---|---|---|---|
Submitted: | 15 Jun 2022 21:07 | Modified: | 28 Dec 2022 16:00 |
Reporter: | Jean-François Gagné | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.7.38, 5.7.40 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[15 Jun 2022 21:07]
Jean-François Gagné
[17 Jun 2022 6:23]
MySQL Verification Team
Hello Jean-François, Thank you for the report and feedback. regards, Umesh
[17 Jun 2022 8:29]
Sven Sandberg
Posted by developer: Thanks for the bug report. This was fixed in 8.0.23, in WL#13574: Include MDL and ACL locks in MTS deadlock detection infra-structure I don't know if there is a request to backport, but please note that this worklog was large and complex, and the code it touches has diverged significantly between 5.7 and 8.0, so it seems too risky to backport. I can't think of a way to fix this in 5.7.
[17 Jun 2022 14:35]
Jean-François Gagné
Thanks for the feedback Umesh and Sven. I did not receive emails about this bug being updated, is mailing broken ?
[17 Jun 2022 15:01]
MySQL Verification Team
Hello Jean-François, >>I did not receive emails about this bug being updated, is mailing broken ? I'll follow up with concern team and get back to you(most likely next week). Thank you. Sincerely, Umesh
[26 Jun 2022 18:22]
Margaret Fisher
Posted by developer: Added bug number to changelog entry for WL #13574: For a multithreaded replica (where slave_parallel_workers is greater than 0), setting slave_preserve_commit_order=1 ensures that transactions are executed and committed on the replica in the same order as they appear in the replica's relay log. Each executing worker thread waits until all previous transactions are committed before committing. If a worker thread fails to execute a transaction because a possible deadlock was detected, or because the transaction's execution time exceeded a relevant wait timeout, it automatically retries the number of times specified by slave_transaction_retries before stopping with an error. Transactions with a non-temporary error are not retried. The replication applier on a multithreaded replica has always handled data access deadlocks that were identified by the storage engines involved. However, some other types of lock were not detected by the replication applier, such as locks involving access control lists (ACLs) or metadata locking (for example, FLUSH TABLES WITH READ LOCK statements). This could lead to three-actor deadlocks with the commit order locking, which could not be resolved by the replication applier, and caused replication to hang indefinitely. From MySQL 8.0.23, deadlock handling on multithreaded replicas that preserve the commit order has been enhanced to mitigate these types of deadlocks. The deadlocks are not specifically resolved by the replication applier, but the applier is aware of them and initiates automatic retries for the transaction, rather than hanging. If the retries are exhausted, replication stops in a controlled manner so that the deadlock can be resolved manually.
[28 Dec 2022 13:58]
Jean-François Gagné
This bug status has be changed to Closed, but it is a little miss-leading. From what I understand, this bug is fixed in 8.0.23, but not in 5.7. I would have expected that this bug stays opened until fixed in 5.7 (note that the bug has been reported as affecting 5.7, not 8.0).
[28 Dec 2022 16:00]
Jean-François Gagné
Adding affected version 5.7.40 (I just confirmed it is also affected with the test-case in How to repeat).