Bug #105243 Execute "mysql.server stop" command deadlock after using "start slave" command
Submitted: 16 Oct 2021 9:09 Modified: 19 Oct 2021 8:27
Reporter: sheng wei (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any

[16 Oct 2021 9:09] sheng wei
Description:
if the slave executes the "start slave" command and the "Global_THD_manager::add_thd" of the "handle_slave_worker" function is not completed, then executing "mysql.server" will generate the following deadlock
Thread 1:
handle_slave_sql -> slave_start_workers -> slave_start_single_worker
hold:mysql_mutex_assert_owner(&rli->run_lock);
wait:mysql_cond_wait(&w->jobs_cond, &w->jobs_lock);

Thread 2:
signal_hand -> close_connections -> Global_THD_manager::do_for_all_thd -> for (int i = 0; i < NUM_PARTITIONS; i++)
hold:LOCK_thd_list
wait:mysql_mutex_lock(killing_thd->current_mutex)
note:current_mutex is mi->rli->run_lock, Assignment in function "start_slave_thread" of  thd->ENTER_COND(start_cond, cond_lock&stage_waiting_for_slave_thread_to_start, &saved_stage);

Thread 3:
handle_slave_worker -> Global_THD_manager::add_thd 
hold:w->jobs_cond
wait:LOCK_thd_list

How to repeat:
Before function "handle_slave_worker" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command

Suggested fix:
Move thd_manager->add_thd(thd) in function "handle_slave_worker" to w->jobs_cond. Send w->jobs_cond before thd_manager->add_thd(thd).

Do you have any other suggestions? Is there a problem with this modification method,thanks
[18 Oct 2021 2:47] sheng wei
How to repeat:
Before function "handle_slave_worker" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command

modify:Before function "handle_slave_sql" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command
[19 Oct 2021 7:52] MySQL Verification Team
Hi,

Do you have a way of reproducing this without changing the code or debugging it? I don't see how this will fail in real life situation?

Thanks
[19 Oct 2021 8:27] sheng wei
This is the problem that occurs in the actual scene environment and is found through stack lookup. Instead of simply looking at the problems found in the code.
Thanks