Bug #89229 | FLUSH PRIVILEGES may cause MTS deadlock | ||
---|---|---|---|
Submitted: | 15 Jan 2018 3:18 | ||
Reporter: | Libing Song | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.7 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[15 Jan 2018 3:18]
Libing Song
[15 Mar 2021 9:53]
WANG GUANGYOU
hit the issue.
[20 Aug 2022 10:04]
Tsubasa Tanaka
I faced this issue in 8.0.19 and I can reproduce this easily by using binlog_group_commit_sync_delay = 1000000. And I can't reproduce yet (at least) 8.0.28 and later.
[1 Sep 2022 4:40]
Tsubasa Tanaka
Is it fixed by WL#13574 ? https://github.com/mysql/mysql-server/commit/a038ae423e6c3ae474e764d64e68dcf0ab4ea676
[16 Sep 2022 14:02]
Sven Sandberg
Posted by developer: There are two parts of this bug: 1. The source server does not determine that FLUSH PRIVILEGES conflicts with GRANT, because FLUSH PRIVILEGES releases a conflicting lock early (violates two-phase locking). Therefore, it marks them as non-conflicting in the binary log. Therefore, the replica is able to execute these statements in parallel, which leads to a deadlock when replica-preserve-commit-order is used. 2. Up until 8.0.23/WL#13574, replicas were unable to detect and resolve deadlocks where one of the parties of the deadlock was a worker thread waiting for preceding workers to commit (according to replica-preserve-commit-order). This was fixed in 8.0.23/WL#13574, so deadlocks are now detected. So the more recent transaction (GRANT in this case) is forced to rollback, which unblocks the older transaction (FLUSH in this case) so that it can proceed, and then the GRANT statement is retried according to replica_transaction_retries. The test case in the 'how to repeat' section shows only problem 1 (and it is implicit that it results in problem 2). So I'd say the described bug is not fixed, although the defect has been mitigated. It would still be better from the replication perspective that FLUSH PRIVILEGES (and all other statements) followed two-phase locking. (so that retries are not necessary) So, let's keep this bug open even if the symptoms are less severe now.