Bug #88693 Mutex deadlock causing mysqld to hang and cease to work
Submitted: 29 Nov 2017 9:53 Modified: 30 Nov 2017 14:24
Reporter: Wei Zhao (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: deadlock

[29 Nov 2017 9:53] Wei Zhao
Description:
Recently we(TDSQL team) started to use the official keyring_udf.so plugin, which is installed whenever mysqld starts or an agent program starts up, so the INSTALL PLUGIN stmt is executed often, and we occasionally find that mysqld hangs and no connection can be made, everything stops working.

After some analysis I found that there is a mutex deadlock between  LOCK_plugin and LOCK_system_variables_hash in find_sys_var_ex() and mysql_install_plugin() functions. 

In both of the two functions, the two mutexes are acquired, at first in the same order, however in plugin_add() called by mysql_install_plugin() , the LOCK_plugin mutex is released and then acquired again to call report_error() in between(while not holding the mutex to enhance parallelism), hence a reverse order of mutex acquisition is formed, and a cyclic wait is formed sometimes, causing the mutex deadlock. 

I have attached my gdb stack trace to prove my findings, as well as my patch to fix the mutex deadlock.

How to repeat:
as above

Suggested fix:
In plugin_add(), do not release and acquire LOCK_plugin, always hold this mutex in this function. See my patch for more information. 

It is not a problem to call report_error() while holding the LOCK_plugin, because this function is not heavy weight(although a file write may take place) and error conditions are rare; and actually in plugin_add(), plugin_dl_add() is called while holding LOCK_plugin, and plugin_dl_add() calls report_error() on error.
[29 Nov 2017 9:53] Wei Zhao
this patch fixes the bug

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: fix-plugin-mtx-deadlock.diff (application/octet-stream, text), 1.21 KiB.

[29 Nov 2017 9:54] Wei Zhao
gdb debugging trace to prove the deadlock

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: mutex-deadlock-trace.txt (text/plain), 8.18 KiB.

[29 Nov 2017 13:50] MySQL Verification Team
Hi!

Thank you for your bug report. 

It turns out that this bug is a duplicate of an internal bug that is fixed and pushed into mysql-8.0.4 release, which is not yet published.

We shall going to enquire whether it is possible to backport this bug to 5.7.
[30 Nov 2017 1:45] Wei Zhao
I found and fixed this bug on mysql-5.7.17.
[30 Nov 2017 14:24] MySQL Verification Team
Hi!

This bug has been fixed in 5.7.21.

It is unknown when will it be released, so keep following the news on dev.mysql.com.
[21 May 2018 10:52] Roel Van de Paar
Not fix yet? See bug 90949
[21 May 2018 11:40] MySQL Verification Team
*** PDUBOIS  paul.dubois 01/05/18 10:17 am *** 
Fixed in 5.7.22, 8.0.4

Installing and uninstalling a plugin many times from multiple 
sessions could cause the server to become unresponsive.