Bug #93649 STOP SLAVE SQL_THREAD deadlocks if done while holding LOCK INSTANCE FOR BACKUP
Submitted: 18 Dec 2018 3:23
Reporter: Sergei Glushchenko (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.13 OS:Any
Assigned to: Bogdan Kecman CPU Architecture:Any

[18 Dec 2018 3:23] Sergei Glushchenko
Description:
STOP SLAVE SQL_THREAD being executed while holding LOCK INSTANCE FOR BACKUP may hang forever. It turns out that sql thread is waiting on backup lock and STOP SLAVE will wait for sql thread to complete current statement which results in deadlock.

How to repeat:
I put together simple MTR test case (you may need to run it couple of times to get the deadlock):

--source include/master-slave.inc

--connection master

let $n=10;

while ($n)
{
  eval CREATE TABLE t$n (a INT);
  dec $n;
}

--connection slave

LOCK INSTANCE FOR BACKUP;
STOP SLAVE SQL_THREAD;
START SLAVE SQL_THREAD;
UNLOCK INSTANCE;

--connection master

let $n=10;

while ($n)
{
  eval DROP TABLE t$n;
  dec $n;
}

--source include/sync_slave_sql_with_master.inc
--source include/rpl_end.inc

If you check slave status, it is 'Slave_SQL_Running_State: Waiting for backup lock'. And

mysql> show processlist;
+----+-----------------+-----------------+------+---------+------+----------------------------------+-------------------------+-----------+---------------+
| Id | User            | Host            | db   | Command | Time | State                            | Info                    | Rows_sent | Rows_examined |
+----+-----------------+-----------------+------+---------+------+----------------------------------+-------------------------+-----------+---------------+
|  4 | event_scheduler | localhost       | NULL | Daemon  |  318 | Waiting on empty queue           | NULL                    |         0 |             0 |
|  9 | root            | localhost:61541 | test | Sleep   |  317 |                                  | NULL                    |         1 |             0 |
| 10 | root            | localhost:61542 | test | Sleep   |  317 |                                  | NULL                    |         1 |             0 |
| 11 | system user     |                 | NULL | Connect |  317 | Waiting for master to send event | NULL                    |         0 |             0 |
| 12 | system user     |                 | test | Query   |  317 | Waiting for backup lock          | CREATE TABLE t3 (a INT) |         0 |             0 |
| 13 | root            | localhost:61548 | test | Query   |  316 | Killing slave                    | STOP SLAVE SQL_THREAD   |         0 |             0 |
| 14 | root            | localhost:61549 | test | Sleep   |  317 |                                  | NULL                    |         1 |             0 |
| 15 | root            | localhost       | NULL | Query   |    0 | starting                         | show processlist        |         0 |             0 |
+----+-----------------+-----------------+------+---------+------+----------------------------------+-------------------------+-----------+---------------+

Suggested fix:
If you replace LOCK INSTANCE FOR BACKUP with FLUSH TABLES WITH READ LOCK, you'll get following error: "Can't execute the given command because you have active locked tables or an active transaction". Consider making STOP SLAVE to fail with the same error in case of LOCK INSTANCE FOR BACKUP too.