Bug #65692 Deadlock between START SLAVE and setting a system variable
Submitted: 20 Jun 2012 18:57 Modified: 16 Nov 2012 15:06
Reporter: Davi Arnaut (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.5.23, 5.5.26 OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution, deadlock, start slave, system variable
Triage: Needs Triage: D2 (Serious)

[20 Jun 2012 18:57] Davi Arnaut
Description:
Starting the SQL thread might deadlock with setting the sql_slave_skip_counter or slave_net_timeout variables.

The deadlock is due to a lock order violation when the variables are set. For example, setting slave_net_timeout first acquires LOCK_global_system_variables in sys_var::update and later acquires LOCK_active_mi in fix_slave_net_timeout. This violates the order established when starting a SQL thread, where LOCK_active_mi is acquired before start_slave, and ends up creating a thread (handle_slave_sql) that allocates a THD handle whose constructor acquires LOCK_global_system_variables in THD::init.

How to repeat:
See attached test case.
[20 Jun 2012 18:59] Davi Arnaut
Fix lock order for system variables that access repl info

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: start-slave-sys_var-deadlock.patch (application/octet-stream, text), 5.11 KiB.

[21 Jun 2012 7:22] Valeriy Kravchuk
Thank you for the bug report and patch contributed. Verified with 5.5.26 on Mac OS X:

macbook-pro:mysql-test openxs$ vi t/rpl_start_slave_deadlock_sys_vars.test
macbook-pro:mysql-test openxs$ touch r/rpl_start_slave_deadlock_sys_vars.result
macbook-pro:mysql-test openxs$ ./mtr rpl_start_slave_deadlock_sys_vars
Logging: ./mtr  rpl_start_slave_deadlock_sys_vars
120621 10:11:10 [Warning] Setting lower_case_table_names=2 because file system for /var/folders/dX/dXCzvuSlHX4Op1g-o1jIWk+++TI/-Tmp-/0YUkB8AB5r/ is case insensitive
120621 10:11:10 [Note] Plugin 'FEDERATED' is disabled.
MySQL Version 5.5.26
Checking supported features...
 - skipping ndbcluster
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
vardir: /Users/openxs/dbs/5.5/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/Users/openxs/dbs/5.5/mysql-test/var'...
Installing system database...
Using server port 53233

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.rpl_start_slave_deadlock_sys_vars   [ fail ]
        Test ended at 2012-06-21 10:17:09

CURRENT_TEST: main.rpl_start_slave_deadlock_sys_vars
--- /Users/openxs/dbs/5.5/mysql-test/r/rpl_start_slave_deadlock_sys_vars.result2012-06-21 10:10:56.000000000 +0300
+++ /Users/openxs/dbs/5.5/mysql-test/r/rpl_start_slave_deadlock_sys_vars.reject2012-06-21 10:17:09.000000000 +0300
@@ -0,0 +1,35 @@
+include/master-slave.inc
+[connection master]
+# connection: slave
+SET @save_slave_net_timeout = @@GLOBAL.slave_net_timeout;
+STOP SLAVE;
+include/wait_for_slave_to_stop.inc
+# open an extra connection to the slave
+# connection: slave2
+# set debug synchronization point
+SET DEBUG_SYNC='fix_slave_net_timeout SIGNAL parked WAIT_FOR go';
+# attempt to set slave_net_timeout, will wait on sync point
+SET @@GLOBAL.slave_net_timeout = 100;
+# connection: slave
+SET DEBUG_SYNC='now WAIT_FOR parked';
+Warnings:
+Warning	1639	debug sync point wait timed out
+# connection: slave1
+# attempt to start the SQL thread
+START SLAVE SQL_THREAD;
+# connection: slave
+# wait until SQL thread has been started
+Timeout in wait_condition.inc for select count(*) = 1 from information_schema.processlist
+where state = "Waiting for slave thread to start" and info = "START SLAVE SQL_THREAD"
...
[16 Nov 2012 15:06] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[16 Nov 2012 15:07] Jon Stephens
Fixed in 5.6.10 and trunk (now tagged 5.7.1).

Thanks for the patch.