MySQL Bugs: #107574: MTR deadlocks when preserving commit order and changing read

Bug #107574	MTR deadlocks when preserving commit order and changing read_only.
Submitted:	15 Jun 2022 21:07	Modified:	28 Dec 2022 16:00
Reporter:	Jean-François Gagné	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.7.38, 5.7.40	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

when changing the read_only global variable on a replica running with MTR and with preserving commit order, replication might deadlock.  See How to repeat for details.

I am reporting this for MySQL 5.7.38, I was not able to reproduce for 8.0.29.

I am filling this bug as S2 / Serious.  One could claim that it is minor because changing read_only on a replicas is not a commun, but I would disagree.  In my environment, I am storing configuration, state and logging information on all my MySQL nodes (including replicas which I run as super_read_only).  When updating this data, I am disabling binlogging for the session and temporarily setting super_readonly to 0 (setting it back to 1 after, which is also triggering the deadlock).  As I am updating this data as often as ever minute, this deadlock is triggered quite often.

My understanding of this deadlock is that when T1 and T2 are executing in parallel, if setting read_only when T2 is waiting for T1 to commit, T2 prevents setting read_only to complete and T1 will block (when ready to commit) on waiting for read_only to complete.

There is an easy "self-healing" for this deadlock: killing the T1 transaction.  This will stop the SQL Thread, but will allow setting read_only to complete.  For resuming replication, we then need to run START SLAVE SQL_THREAD.  We could also kill the "SET GLOBAL read_only = 1", but we then need to run it again, which might deadlock again (the other way allows more predictable progress and not failing the "SET GLOBAL read_only = 1", but both work).

Many thanks for looking into this, Jean-François Gagné

How to repeat:
# Create a sandbox with dbdeployer.
dbdeployer deploy replication mysql_5.7.38 -c log_bin -c log_slave_updates \
  -c "slave_parallel_type = LOGICAL_CLOCK" -c "slave_parallel_workers = 2" -c "slave_preserve_commit_order = 1"

# Initialize the sandbox.
# We use SBR dans commit delay to trigger T2 waiting for T1 to commit.
# More about this trick in https://jfg-mysql.blogspot.com/2022/05/triggering-replication-lag-for-testing-a-script.htm...
./m <<< "
  CREATE DATABASE test_jfg;
  CREATE TABLE test_jfg.t(id INT AUTO_INCREMENT PRIMARY KEY, v DOUBLE, d DATETIME);
  INSERT INTO test_jfg.t (v, d) SELECT 1, NOW() FROM information_schema.INNODB_METRICS LIMIT 2;
  SET GLOBAL binlog_format = STATEMENT;
  SET GLOBAL binlog_group_commit_sync_delay = 50000;
"

# Generate a flow of transaction where T2 will wait for T1 to commit. 
while date; do
  ./m test_jfg <<< "UPDATE t SET v = SLEEP(0.2), d = now() WHERE id = 1" &
  ./m test_jfg <<< "DO sleep(0.2); UPDATE t SET v = SLEEP(0.01), d = now() WHERE id = 2" &
  wait
done

# While above is running, change the read_only status of the s1 replica in a loop.
# At some point, this will block and we then have a deadlock.
while date; do ./s1 <<< "SET GLOBAL read_only = 1; SET GLOBAL read_only = 0;"; done

# When above blocks, we can look at what is executing and confirm the deadlock.
./s1 -e "SHOW PROCESSLIST"
+----+-------------+-----------+------+---------+------+---------------------------------------------+--------------------------+
| Id | User        | Host      | db   | Command | Time | State                                       | Info                     |
+----+-------------+-----------+------+---------+------+---------------------------------------------+--------------------------+
|  4 | system user |           | NULL | Connect | 1094 | Waiting for master to send event            | NULL                     |
|  5 | system user |           | NULL | Connect |   25 | Waiting for dependent transaction to commit | NULL                     |
|  6 | system user |           | NULL | Connect |   25 | Waiting for commit lock                     | NULL                     |
|  7 | system user |           | NULL | Connect |   25 | Waiting for preceding transaction to commit | NULL                     |
| 20 | msandbox    | localhost | NULL | Query   |   25 | Waiting for commit lock                     | SET GLOBAL read_only = 1 |
| 22 | msandbox    | localhost | NULL | Query   |    0 | starting                                    | show processlist         |
+----+-------------+-----------+------+---------+------+---------------------------------------------+--------------------------+

# Once we have the deadlock, we can solve it with below (needs running START SLAVE SQL_THREAD after).
./s1 <<< "SHOW PROCESSLIST" | awk '$0 ~ "system user.*Waiting for commit lock"{print "KILL " $1}' | ./s1

# We can also solve the deadlock with below, but this will make the SET GLOBAL fail.
./s1 <<< "SHOW PROCESSLIST" | awk '$0 ~ "SET GLOBAL read_only = 1"{print "KILL " $1}' | ./s1

Hello Jean-François,

Thank you for the report and feedback.

regards,
Umesh

Posted by developer:
 
Thanks for the bug report. This was fixed in 8.0.23, in
WL#13574: Include MDL and ACL locks in MTS deadlock detection infra-structure

I don't know if there is a request to backport,
but please note that this worklog was large and complex,
and the code it touches has diverged significantly between 5.7 and 8.0,
so it seems too risky to backport.
I can't think of a way to fix this in 5.7.

Thanks for the feedback Umesh and Sven.

I did not receive emails about this bug being updated, is mailing broken ?

Hello Jean-François,

>>I did not receive emails about this bug being updated, is mailing broken ?

I'll follow up with concern team and get back to you(most likely next week). Thank you.

Sincerely,
Umesh

Posted by developer:
 
Added bug number to changelog entry for WL #13574:

        For a multithreaded replica (where slave_parallel_workers is
        greater than 0), setting slave_preserve_commit_order=1
        ensures that transactions are executed and committed on the
        replica in the same order as they appear in the replica's relay
        log. Each executing worker thread waits until all previous
        transactions are committed before committing. If a worker thread
        fails to execute a transaction because a possible deadlock was
        detected, or because the transaction's execution time exceeded a
        relevant wait timeout, it automatically retries the number of
        times specified by slave_transaction_retries
        before stopping with an error. Transactions with a non-temporary
        error are not retried.

        The replication applier on a multithreaded replica has always
        handled data access deadlocks that were identified by the
        storage engines involved. However, some other types of lock were
        not detected by the replication applier, such as locks involving
        access control lists (ACLs) or metadata locking (for example,
        FLUSH TABLES WITH READ LOCK statements). This
        could lead to three-actor deadlocks with the commit order
        locking, which could not be resolved by the replication applier,
        and caused replication to hang indefinitely. From MySQL 8.0.23,
        deadlock handling on multithreaded replicas that preserve the
        commit order has been enhanced to mitigate these types of
        deadlocks. The deadlocks are not specifically resolved by the
        replication applier, but the applier is aware of them and
        initiates automatic retries for the transaction, rather than
        hanging. If the retries are exhausted, replication stops in a
        controlled manner so that the deadlock can be resolved manually.

This bug status has be changed to Closed, but it is a little miss-leading.

From what I understand, this bug is fixed in 8.0.23, but not in 5.7.  I would have expected that this bug stays opened until fixed in 5.7 (note that the bug has been reported as affecting 5.7, not 8.0).

Adding affected version 5.7.40 (I just confirmed it is also affected with the test-case in How to repeat).