Bug #95249 stop slave permanently blocked
Submitted: 5 May 2019 8:11 Modified: 6 May 2019 4:51
Reporter: Wei Zhao (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:mysql-5.7 OS:Any
Assigned to: CPU Architecture:Any
Tags: blocked, Contribution, stop slave

[5 May 2019 8:11] Wei Zhao
Description:
The 'stop slave' command can be blocked under some circumstances permanently, in which existing 'rpl_stop_slave_timeout' is unable to make 'stop slave' return in time. This issue sometimes can cause serious problems, as is in our case.

The issue could happen when slave breaks from master and the end of the relay log has a partially written transaction's binlogs. then the slave is started again and receives a 'stop slave' command. And it doesn't happen every time, it happens only randomly.

the crucial call stacks is as below:

Thread 99 (Thread 0x7f7143d28700 (LWP 10413)):
#0  0x00007f71455ecab2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000eed610 in ?? ()
#2  0x0000000000eed801 in terminate_slave_threads(Master_info*, int, unsigned long, bool) ()
#3  0x0000000000ef3ff9 in stop_slave(THD*, Master_info*, bool, bool, bool*) ()
#4  0x0000000000ef4372 in stop_slave(THD*) ()
#5  0x0000000000ef44d4 in stop_slave_cmd(THD*) ()
#6  0x0000000000cf1aa5 in mysql_execute_command(THD*, bool) ()
#7  0x0000000000cf4d1d in mysql_parse(THD*, Parser_state*) ()
#8  0x0000000000cf6173 in dispatch_command(THD*, COM_DATA const*, enum_server_command) ()
#9  0x0000000000cf6db0 in do_command(THD*) ()
#10 0x0000000000ddc9c4 in handle_connection ()
#11 0x000000000125bdcd in pfs_spawn_thread ()
#12 0x00007f71455e8df3 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f714409c1bd in clone () from /lib64/libc.so.6

Thread 66 (Thread 0x7f7118137700 (LWP 20397)):
#0  0x00007f71455eacb1 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x0000000000eee248 in mts_checkpoint_routine(Relay_log_info*, unsigned long long, bool, bool) ()
#2  0x0000000000f11194 in Mts_submode_logical_clock::wait_for_workers_to_finish(Relay_log_info*, Slave_worker*) ()
#3  0x0000000000f1214e in Mts_submode_logical_clock::schedule_next_event(Relay_log_info*, Log_event*) ()
#4  0x0000000000ea8e98 in schedule_next_event(Log_event*, Relay_log_info*) ()
#5  0x0000000000eb21d7 in Log_event::get_slave_worker(Relay_log_info*) ()
#6  0x0000000000eb5bf0 in Log_event::apply_event(Relay_log_info*) ()
#7  0x0000000000ef269a in apply_event_and_update_pos(Log_event**, THD*, Relay_log_info*) ()
#8  0x0000000000ef85f1 in ?? ()
#9  0x0000000000efe0b6 in handle_slave_sql ()
#10 0x000000000125bdcd in pfs_spawn_thread ()
#11 0x00007f71455e8df3 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f714409c1bd in clone () from /lib64/libc.so.6
Thread 62 (Thread 0x7f7078ba7700 (LWP 20401)):
#0  0x00007f71455ec705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000f12a1b in Commit_order_manager::wait_for_its_turn(Slave_worker*, bool) ()
#2  0x0000000000ed460d in MYSQL_BIN_LOG::ordered_commit(THD*, bool, bool) ()
#3  0x0000000000ed520e in MYSQL_BIN_LOG::commit(THD*, bool) ()
#4  0x00000000007f7e4a in ha_commit_trans(THD*, bool, bool) ()
#5  0x0000000000db2919 in trans_commit(THD*) ()
#6  0x0000000000ea87c9 in Xid_log_event::do_commit(THD*) ()
#7  0x0000000000ea86c4 in Xid_apply_log_event::do_apply_event_worker(Slave_worker*) ()
#8  0x0000000000f10862 in slave_worker_exec_job_group(Slave_worker*, Relay_log_info*) ()
#9  0x0000000000ef4a1b in handle_slave_worker ()
#10 0x000000000125bdcd in pfs_spawn_thread ()
#11 0x00007f71455e8df3 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f714409c1bd in clone () from /lib64/libc.so.6

How to repeat:
The way to reproduce such a situation is to set up a master-slave replication, run a while with heavy load, and then kill the slave process, in order to produce a relay log that has a partial transaction at its end. If the relay log is complete, one can truncate the last few bytes from the relay log file, using mysqlbinlog to dump the relay log to find an offset to truncate to. 

Then, start the slave mysqld but only do 'start slave sql_thread', don't start the IO thread. then issue a 'stop slave' command, then *sometimes* one could reproduce the issue --- it only happens rarely.

Suggested fix:
I'm attaching my patch which fixed the issue. I'm also attaching the callstack got when the issue happened. Apart from above permanent block, I also located another place which could cause a similar permanent block of 'stop slave' command, which  is also fixed in my patch.
[5 May 2019 8:12] Wei Zhao
this patch fixes the bug

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: stop-slave-blocked.2.diff (application/octet-stream, text), 3.42 KiB.

[5 May 2019 8:13] Wei Zhao
full callstack of the issue

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: stop-slave-blocked-callstack.txt (text/plain), 104.66 KiB.

[6 May 2019 4:51] MySQL Verification Team
Hello Wei Zhao,

Thank you for the report and contribution.

regards,
Umesh
[24 Sep 7:48] Venkatesh Prasad Venugopal
Looks like all the below bugs are duplicates of each other.

Bug #87796 Commit_order_manager can't terminate MTS worker properly when deadlock happens
Bug #89247 Deadlock with MTS when slave_preserve_commit_order = ON.
Bug #95249 stop slave permanently blocked
Bug #99440 Threads of MTS Slave randomly stuck

Here is the root cause for the bug.

When --slave-preserve-commit-order is enabled on slave and if the
waiting thread has locked the rows which are needed by the thread
executing the previous transaction(as per their order in the relay log),
then the innodb deadlock detection algorithm detects the deadlock
between workers and will ask the waiting thread to rollback (only if its
sequence number is lesser than that of the waiting thread).

When this happens, the waiting thread wakes up from the cond_wait(SPCO)
and it gets to know that it was asked to rollback by its preceding
transaction as it was holding a lock that is needed by the other
transaction to progress. It then rolls back its transaction so that the
the preceding transaction can be committed and retries the transaction.

The above logic sometimes caused the worker thread to miss the signals
resulting in the replica server to hang. One of such hang is mentioned
below.

Consider a replica server which is configured with
slave_parallel_workers=3, slave_parallel_type=LOGICAL_CLOCK,
slave_preserve_commit_order=1 and slave_transaction_retries=0. When MTS
is enabled, it is quite possible that workers execute out of order
causing the below state.

Worker 1 - Processing the events of Transaction T1
Worker 2 - Executed Transaction T2 and is waiting for T1 to commit.
Worker 3 - Processing the events of Transaction T3

1. If T1 and T2 are modifying same rows in InnodB, then the worker 1
   detects deadlock and asks worker 2 to rollback by signalling.
2. Worker 2 wakes up from the cond_wait. It gets to know that it was
   asked to roll back by the other transaction and returns with an
   error.
3. Worker 2 rolls back the transaction and comes to the retry part of
   the code and checks the value of slave_transaction_retries. Since it
   is 0, it returns from the handle_slave_worker loop and enters the
   error handling part.
4. As part of error handling, Worker 2 notifies the co-ordinator that it
   is exiting, and then calls report_rollback() function to unregister
   itself from the SPCO queue.
5. While executing report_rollback(), Worker 2 will again enter
   wait_for_its_turn(). But before entering the wait, it checks the
   commit_order_deadlock flag. Since the flag is already set, Worker2
   immediately returns from the function with error.
6. Co-ordinator thread gets this information and sets the
   rli->abort_slave=1 to stop replication and waits till all workers
   exit.
7. Worker 2 exits. There is no worker 2 from here onwards.

   Now the status is,
   Worker 1 - Processing the events of Transaction T1
   Worker 2 - Not running.
   Worker 3 - Processing the events of Transaction T3

8. Now the worker 1 proceeds and executes the transaction and enters the
   Commit_order_manager::wait_for_its_turn.
9. Worker 1 finds out that the previous worker(Worker 2) failed because
   of an error.
10. Worker 1 signals next transaction/worker to proceed.
11. Worker 3 executes the transaction and enters the
    Commit_order_manager::wait_for_its_turn.
12. Worker 1 rolls back the transaction and eventually exits.
13. There will be no one to signal Worker 3 and thus waits forever.

This resulted in a system hang as the co-ordinator thread will be
waiting for the worker thread to finish and the worker thread will be
waiting for the signal to proceed with the commit.

mysql> show processlist; 
+----+-------------+-----------------+------+---------+------+-----------------------------------------+
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | 
+----+-------------+-----------------+------+---------+------+-----------------------------------------+
| 2 | root | localhost:55708 | test | Query | 0 | starting | show processlist | 0 | 0 | 
| 3 | system user | | NULL | Connect | 107 | Waiting for master to send event | NULL | 0 | 0 | 
| 4 | system user | | NULL | Connect | 77 | Waiting for workers to exit | NULL | 0 | 0 | 
| 7 | system user | | NULL | Connect | 84 | Waiting for preceding transaction to commit | NULL | 0 | 0 | 
+----+-------------+-----------------+------+---------+------+-----------------------------------------+
[24 Sep 7:54] Venkatesh Prasad Venugopal
Contributed patch on top of 5.7.31

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-MTS-deadlock-with-slave_preserve_commit_order.patch (text/x-patch), 37.06 KiB.

[24 Sep 8:03] MySQL Verification Team
Thank you, Venu for the contribution.

regards,
Umesh
[29 Sep 5:21] Venkatesh Prasad Venugopal
Contributed patch on top of 8.0.21

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-MTS-deadlock-with-slave_preserve_commit_order-8.0.patch (text/x-patch), 37.04 KiB.

[29 Sep 5:37] MySQL Verification Team
Thank you, Venu for the contribution.

regards,
Umesh