Bug #105243 Execute "mysql.server stop" command deadlock after using "start slave" command
Submitted: 16 Oct 2021 9:09 Modified: 19 Oct 2021 8:27
Reporter: sheng wei (OCA) Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.25 OS:Any
Assigned to: CPU Architecture:Any

[16 Oct 2021 9:09] sheng wei
if the slave executes the "start slave" command and the "Global_THD_manager::add_thd" of the "handle_slave_worker" function is not completed, then executing "mysql.server" will generate the following deadlock
Thread 1:
handle_slave_sql -> slave_start_workers -> slave_start_single_worker
wait:mysql_cond_wait(&w->jobs_cond, &w->jobs_lock);

Thread 2:
signal_hand -> close_connections -> Global_THD_manager::do_for_all_thd -> for (int i = 0; i < NUM_PARTITIONS; i++)
note:current_mutex is mi->rli->run_lock, Assignment in function "start_slave_thread" of  thd->ENTER_COND(start_cond, cond_lock&stage_waiting_for_slave_thread_to_start, &saved_stage);

Thread 3:
handle_slave_worker -> Global_THD_manager::add_thd 

How to repeat:
Before function "handle_slave_worker" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command

Suggested fix:
Move thd_manager->add_thd(thd) in function "handle_slave_worker" to w->jobs_cond. Send w->jobs_cond before thd_manager->add_thd(thd).

Do you have any other suggestions? Is there a problem with this modification method,thanks
[18 Oct 2021 2:47] sheng wei
How to repeat:
Before function "handle_slave_worker" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command

modify:Before function "handle_slave_sql" of "thd_manager->add_thd(thd)" add "my_sleep(60000000)". When the command is executed to my_sleep(60000000), execute the mysql.server stop" command
[19 Oct 2021 7:52] MySQL Verification Team

Do you have a way of reproducing this without changing the code or debugging it? I don't see how this will fail in real life situation?

[19 Oct 2021 8:27] sheng wei
This is the problem that occurs in the actual scene environment and is found through stack lookup. Instead of simply looking at the problems found in the code.