Bug #67879 Slave deadlock caused by stop slave, show slave status and global read lock
Submitted: 11 Dec 2012 22:59 Modified: 20 Feb 2013 5:14
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1, 5.5, 5.6 OS:Any
Assigned to: CPU Architecture:Any
Triage: Needs Triage: D2 (Serious)

[11 Dec 2012 22:59] Yoshinori Matsunobu
Description:
See how to repeat

How to repeat:
Client 1
slave> FLUSH TABLES WITH READ LOCK;

Client 2
master> insert into t1 values (100, 1, 1); # any updating statement
 -> slave sql thread waits for global read lock

Client 3
slave> stop slave sql_thread;
 -> This hangs on stop_cond mutex. stop_cond will be broadcasted when sql thread terminates. This won't happen until client1 unlocks global readlock

Client 1
slave> show slave status;
 -> This hangs on LOCK_active_mi mutex. LOCK_active_mi is held by client 3 but client3 waits for client1. Deadlock.

KILL connection didn't work.

Suggested fix:
I think either of the following should be supported.

- Timeouts on show slave status / stop slave
- Accepting KILL commands
[11 Dec 2012 23:04] Domas Mituzas
also not deadlocking would be a good fix too!

(btw, this was hit on 5.1)
[11 Dec 2012 23:13] Yoshinori Matsunobu
Yes, this repeated on 5.1.65, too.
[12 Dec 2012 20:23] Sveta Smirnova
Thank you for the report.

Verified as described.
[28 Jan 2013 13:18] Erlend Dahl
This bug has been fixed in the (internal) 5.7.0 milestone.
[20 Feb 2013 5:14] Yoshinori Matsunobu
Any chance to be pushed into 5.6.x as well?
[23 May 2013 18:09] Shane Bester
Info from 5.6.11.

+---------+------+----------------------------------+-----------------------------+
| Command | Time | State                            | Info                        |
+---------+------+----------------------------------+-----------------------------+
| Connect |  176 | Waiting for master to send event | NULL                        |
| Connect |  118 | Waiting for global read lock     | insert into t1 values (100) |
| Query   |  102 | Killing slave                    | stop slave sql_thread       |
| Query   |   86 | init                             | show slave status           |
| Query   |    0 | init                             | show processlist            |

Thread stacks:
insert:
 	kernel32.dll!_SleepConditionVariableCS@12()  + 0x21 bytes	
 	mysqld.exe!pthread_cond_timedwait
 	mysqld.exe!MDL_wait::timed_wait
 	mysqld.exe!MDL_context::acquire_lock
 	mysqld.exe!open_table
 	mysqld.exe!open_and_process_table
 	mysqld.exe!open_tables
 	mysqld.exe!open_normal_and_derived_tables
 	mysqld.exe!mysql_insert
 	mysqld.exe!mysql_execute_command
 	mysqld.exe!mysql_parse
 	mysqld.exe!Query_log_event::do_apply_event
 	mysqld.exe!Query_log_event::do_apply_event
 	mysqld.exe!Log_event::apply_event
 	mysqld.exe!apply_event_and_update_pos
 	mysqld.exe!exec_relay_log_event
 	mysqld.exe!handle_slave_sql
 	mysqld.exe!pfs_spawn_thread
 	mysqld.exe!pthread_start
>	mysqld.exe!_callthreadstartex
 	mysqld.exe!_threadstartex

stop slave:	
	kernel32.dll!_SleepConditionVariableCS@12
 	mysqld.exe!pthread_cond_timedwait
 	mysqld.exe!terminate_slave_thread
 	mysqld.exe!terminate_slave_threads
 	mysqld.exe!stop_slave
 	mysqld.exe!mysql_execute_command
 	mysqld.exe!mysql_parse
 	mysqld.exe!dispatch_command
 	mysqld.exe!do_command
 	mysqld.exe!do_handle_one_connection
 	mysqld.exe!handle_one_connection
 	mysqld.exe!pfs_spawn_thread
 	mysqld.exe!pthread_start
>	mysqld.exe!_callthreadstartex
 	mysqld.exe!_threadstartex

show slave status:	
	ntdll.dll!_RtlEnterCriticalSection@4()  + 0x16a38 bytes	
 	mysqld.exe!inline_mysql_mutex_lock
 	mysqld.exe!mysql_execute_command
 	mysqld.exe!mysql_parse
>	mysqld.exe!dispatch_command
 	mysqld.exe!do_command
 	mysqld.exe!do_handle_one_connection
 	mysqld.exe!handle_one_connection
 	mysqld.exe!pfs_spawn_thread
 	mysqld.exe!pthread_start
 	mysqld.exe!_callthreadstartex
 	mysqld.exe!_threadstartex
[23 May 2013 18:15] Shane Bester
I requested this be fixed in GA versions.
Bug 16856735 - SLAVE DEADLOCK CAUSED BY STOP SLAVE, SHOW SLAVE STATUS AND GLOBAL
[20 Sep 2013 14:04] Jon Stephens
A fix for this is included in MySQL 5.7: SHOW SLAVE STATUS NONBLOCKING.

Status unchanged.
[10 Feb 2014 21:22] James Heggs
Also be advised that the Percona hotbackup tool will issue a 'FLUSH TABLES....' if you do not specify the "--no-lock" option.

Be very careful if you are using a tool, such as nagios to check your slave status whilst taking hot backups.

Are there any thoughts towards back porting the fix to 5.6.x?
[3 Nov 2014 18:02] monty solomon
We experience the slave hang frequently now -- already twice today. Please fix the problem in 5.6.
[19 Dec 2014 3:05] monty solomon
We experience multiple failures due to this bug every day in version 5.6.

The fix in Oracle bug 19843808 is only for version 5.7.

This bug should be reopened as severity S1 since there is no available workaround for 5.6.

Please backport the fix to version 5.6.

Thanks.
[19 Dec 2014 4:59] Sujatha Sivakumar
Oracle bug 19843808  is fixed in 5.6 as well.  Fix will be available in upcoming 5.6 release