MySQL Bugs: #115730: Deadlock between OPTIMIZE TABLE and XA transaction in slave

Bug #115730	Deadlock between OPTIMIZE TABLE and XA transaction in slave
Submitted:	31 Jul 2024 4:32	Modified:	31 Jul 2024 6:45
Reporter:	yifei syc (OCA)	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	8.0.39	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Optimize/repair/analyze table will write binlog after mysql_admin_table func, which leads to ordered commit without mdl locks.

source codes are like:
bool Sql_cmd_optimize_table::execute(THD *thd) {
  Table_ref *first_table = thd->lex->query_block->get_table_list();
  bool res = true;
  DBUG_TRACE;

  if (check_table_access(thd, SELECT_ACL | INSERT_ACL, first_table, false,
                         UINT_MAX, false))
    goto error; /* purecov: inspected */
  thd->enable_slow_log = opt_log_slow_admin_statements;
  res = (specialflag & SPECIAL_NO_NEW_FUNC)
            ? mysql_recreate_table(thd, first_table, true)
            : mysql_admin_table(thd, first_table, &thd->lex->check_opt,
                                "optimize", TL_WRITE, true, false, 0, nullptr,
                                &handler::ha_optimize, 0, m_alter_info, true);
  /* ! we write after unlocking the table */
  if (!res && !thd->lex->no_write_to_binlog) {
    /*
      Presumably, OPTIMIZE and binlog writing doesn't require synchronization
    */
    res = write_bin_log(thd, true, thd->query().str, thd->query().length);
  }
  thd->lex->query_block->m_table_list.first = first_table;
  thd->lex->query_tables = first_table;

error:
  return res;
}

The comments seem to say that it is designed to not sync mdl and writing binlog.

However, such design lead to replication apply stopped on slave.
Dead lock case is:
In master:
1. Thread A do optimize table t1, after releasing mdl locks, it start writing binlog.(Let's say TrxA)
2. Meanwhile, Thread B start XA transaction and do DML on table t1. Because thread A has released MDL locks, thread B won't be blocked.
3. Thread B do XA PREPARED(TrxB.1). THREAD A and B have chance to go into the same group in ordered_commit.
4. Thread B do XA COMMIT(TrxB.2).
In binlog, TrxA and TrxB.1 could be executed parallel. TrxB.2 is behind TrxA and TrxB.1.

In slave:
1. TrxB.1 starts first, holding MDL shared lock.
2. TrxA starts, acquiring MDL exclusive lock, and waiting TrxB done.
The MDL shared lock held by TrxB.1 will be released after TrxB.2 is done.
However, TrxB.2 is blocked by TrxA, because of seq_num of TrxA is smaller than last_committed of TrxB.2.
so finally, deadlock between TrxA and TrxB.

How to repeat:
I wrote a testcase to repeat the deadlock.

--source include/have_debug.inc
--source include/not_group_replication_plugin.inc
--source include/master-slave.inc

--connect (conn1_1,127.0.0.1,root,,test,$MASTER_MYPORT)
--connect (conn1_2,127.0.0.1,root,,test,$MASTER_MYPORT)
--connect (conn2,127.0.0.1,root,,test,$SLAVE_MYPORT)

--connection conn1_1

CREATE DATABASE apply_stop;
CREATE TABLE apply_stop.t1 (id int);
insert into apply_stop.t1 values (1);

SET DEBUG_SYNC='bgc_before_flush_stage SIGNAL s1 WAIT_FOR s2';
# optimize/repair/analyze table have the same problem.
--send optimize table apply_stop.t1;

--connection conn1_2

SET DEBUG_SYNC='now wait_for s1';
SET DEBUG_SYNC='bgc_before_flush_stage SIGNAL s2';
xa begin 'x1';
insert into apply_stop.t1 values (2);
xa end 'x1';
xa prepare 'x1';

--connection conn1_1

--reap
--sleep 5
# xa commit will be blocked by ddl on followers.
# so followers will never catch up. 
xa commit 'x1';

# sync will never achieve.
--sync_slave_with_master

DROP DATABASE apply_stop;

Hello Yichang Song,

Thank you for the report and test case.

regards,
Umesh