Bug #93761 optimize table cause the slave MTS deadlock!
Submitted: 29 Dec 2018 2:45 Modified: 9 Mar 23:35
Reporter: yoga yoga Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.22 OS:Red Hat (6.5)
Assigned to: CPU Architecture:Any
Tags: MTS, optimize table

[29 Dec 2018 2:45] yoga yoga
Description:
version: mysql5.7.22
master-slave
Our replication configuration is as follows:
enforce_gtid_consistency=ON
gtid_mode=ON
slave_parallel_workers=8
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=1
binlog_order_commits=1
log_slave_updates=ON

we find the slave sql thread does not worker, the show processlist is:
mysql> show processlist;
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| Id | User        | Host      | db   | Command | Time | State                                       | Info                          |
+----+-------------+-----------+------+---------+------+---------------------------------------------+-------------------------------+
| 37 | root        | localhost | NULL | Query   |    0 | starting                                    | show processlist              |
| 38 | system user |           | NULL | Connect | 3044 | Waiting for master to send event            | NULL                          |
| 39 | system user |           | NULL | Connect | 2695 | Waiting for dependent transaction to commit | NULL                          |
| 40 | system user |           |      | Connect | 2749 | Waiting for table metadata lock             | optimize table sbtest.sbtest1 |
| 41 | system user |           | NULL | Connect | 2652 | Waiting for preceding transaction to commit | NULL                          |
| 42 | system user |           | NULL | Connect | 2652 | Waiting for preceding transaction to commit | NULL 

[analyse]
the master:
we find the optimize table binlog has the same last_commited num with the same table dml sql.
example:
last_committed=8101     sequence_number=8102          optimize table sbtest.sbtest1
last_committed=8101     sequence_number=8103
last_committed=8101     sequence_number=8104
last_committed=8101     sequence_number=8105
last_committed=8101     sequence_number=8106
last_committed=8101     sequence_number=8107
last_committed=8101     sequence_number=8108
last_committed=8101     sequence_number=8109
last_committed=8101     sequence_number=8110          update the sbtest.sbtest1   
the slave:
the seq_num in [8102, 8110] can work in parallel, but if the follow happens, it will deadlock:
[sequence_number=8110, wait for the 8102 to do first, and alse have the mdl_share lock of table sbtest.sbtest1]
Commit_order_manager::wait_for_its_turn
MYSQL_BIN_LOG::ordered_commit
MYSQL_BIN_LOG::commit
[sequence_number=8102, wait for the mdl_share of sbtest.sbtest1 upgraded to mdl_exclusive lock]
MDL_context::upgrade_shared_lock (this=0x10ff60a8, mdl_ticket=0x13db7a90, new_type=MDL_EXCLUSIVE,lock_wait_timeout=31536000)
wait_while_table_is_used (thd=0x10ff6000, table=0x132ac020, function=HA_EXTRA_PREPARE_FOR_RENAME)
mysql_inplace_alter_table
mysql_alter_table

[questions]
1. how to fix?
2. we verify alter table sbtest.sbtest1 engine=innodb does not has this problem, why?
3. other online ddl has the same problems?

How to repeat:
1: we should have the same master-slave mysql.

2: do the sysbench test in the master.
(you should prepare the 16 tables of sbtest before)
   ./sysbench --test=/home/sysbench/db/oltp.lua --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=root  --mysql-password=123456 --mysql-port=3306 --mysql-host=127.0.0.1 --mysql-db=sbtest --oltp-tables-count=16 --max-requests=0 --max-time=20000 --num-threads=50 --report-interval=10 run

3:exec the follow shell in the master:
#!/bin/bash
while true
do
  echo 'optimize table sbtest.sbtest1;'
  mysql -uroot -p123456 -e'optimize table sbtest.sbtest1;'
  sleep 1
  echo 'optimize table sbtest.sbtest2;'
  mysql -uroot -p123456 -e'optimize table sbtest.sbtest2;'
  sleep 2
done

4: we execute show processlist in slave to find whether the MTS deadlock has happened?
[21 Jan 6:58] Jean-François Gagné
I suggest the following workaround: replace your optimize table with an ALTER TABLE ENGINE INNODB.  Another work-arount could be to disable slave_preserve_commit_order, but you might not want to do that.

It is not actually the optimize table that cause the deadlock, it is a combinasion of enabling slave_preserve_commit_order and of the optimize not being considered as a barrier when generating the logical clocks (last_committed and sequence_number), but I think an ALTER TABLE is.

Long story made short, DDL should not be run in parallel on slaves.  When written to the binlogs, a DDL (including an ALTER TABLE or a CREATE TABLE) should force all following last_committed to be greater or equal to the sequence_number of this DDL.  Think of it this way: you cannot run an INSER in a TABLE before the CREATE TABLE is run.
[7 Mar 4:14] Bogdan Kecman
Hi,

Not a bug, a design limitation maybe but a bug - no. 
Jean-François already explained why (Thanks!!). 

Thanks
Bogdan

p.s. this type of issues are handled differently (more safely) in 8.x due to DDL's being transactional
[9 Mar 23:35] Bogdan Kecman
Hi,

I discussed this a little bit with colleagues and I change my mind, I think this actually is a bug. Probbly only that's not going to be fixed since 8.x fixes it and to fix in 5.7 we'd need a change too big to allow to enter into 5.7 but still I'll verify it.

all best
Bogdan