Bug #106222 install/uninstall plugin concurrent with new connections my be deadlock
Submitted: 20 Jan 2022 8:28 Modified: 21 Jan 2022 14:18
Reporter: zkong kong Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S3 (Non-critical)
Version:5.7.37, 8.0.28 OS:Linux
Assigned to: CPU Architecture:ARM

[20 Jan 2022 8:28] zkong kong
Description:
Install or uninstall plugins may prevent creating new connections and It's more likely happen on arm platform. After analyze the stacks I found the deadlock cycle:

thd 1: in THD::init hold LOCK_global_system_variables and aquire LOCK_plugin

void plugin_thdvar_init(THD *thd, bool enable_plugins) {
  ... ...
  
  mysql_mutex_lock(&LOCK_global_system_variables);
  
  ... ...

  if (enable_plugins) {
    mysql_mutex_lock(&LOCK_plugin);

thd 2: in uninstall plugin, install is the same:
       hold LOCK_plugin and aquire LOCK_system_variables_hash

static void reap_plugins(void) {
   ... ...
   mysql_mutex_lock(&LOCK_plugin);

  while ((plugin = *(--reap))) plugin_del(plugin); // lock LOCK_system_variables_hash

tatic void plugin_del(st_plugin_int *plugin) {
  ... ...
  mysql_rwlock_wrlock(&LOCK_system_variables_hash);

thd 3: hold LOCK_system_variables_hash and aquire LOCK_global_system_variables
       thd_prepare_connection
       ->prepare_new_connection_state
       ---> alloc_and_copy_thd_dynamic_variables

void alloc_and_copy_thd_dynamic_variables(THD *thd, bool global_lock) {
  mysql_rwlock_rdlock(&LOCK_system_variables_hash);

  if (global_lock) mysql_mutex_lock(&LOCK_global_system_variables);

  

How to repeat:
read the source code
[20 Jan 2022 13:51] MySQL Verification Team
Hi Mr. kong,

Thank you for your bug report.

However, your report is not complete.

First of all, you claim that loading a  plugin prevents new connections to be established. This is expected behaviour. While plugin is loading or unloading, new connections can not be established.

However, further on, you claim that deadlock occurs. It means that many threads or the entire server will be blocked in the deadlock.

In order to verify that possibility, we need a fully repeatable test case so that we could witness a deadlock  it in vivo. Next, please prove why can it only happen on ARM !!! Also, can it happen on macOS ARM or only on Linux ARM ????

We are expecting answers for all of our questions.
[21 Jan 2022 13:55] zkong kong
First of all, you claim that loading a  plugin prevents new connections to be established. This is expected behaviour. While plugin is loading or unloading, new connections can not be established.

However, further on, you claim that deadlock occurs. It means that many threads or the entire server will be blocked in the deadlock.

----> Yes, not hold a while, the client can't login so I suspect it's deadlock and review the stacks found the lock cycle above. Any new connection will wait in THD::init?

In order to verify that possibility, we need a fully repeatable test case so that we could witness a deadlock  it in vivo. Next, please prove why can it only happen on ARM !!! Also, can it happen on macOS ARM or only on Linux ARM ????

----> The lock cycle is from the stacks we encountered in our test environment which is linux arm。I can only find the lock cycle now, if must be  reproduced need add some sync points.
[21 Jan 2022 14:18] MySQL Verification Team
Hi,

Thank you for your answer.

Please, separate out answers to your responses as your comments are quite unreadable in this format.

We shall wait on your fully reproducible test case in order to proceed with the processing of this report.

Do note that we asked you some other questions on which we did not receive any answers.

For the time being, we can't repeat your report.
[30 Sep 0:56] Jinyou Ma
Hi,

The issue can be reproduced with the following steps

- session 1
mysqlslap --delimiter=";" \
  --create="CREATE TABLE t (id int auto_increment primary key, b int);" \
  --query="INSERT INTO t (b) VALUES (1)" --concurrency=50 --iterations=200
- session 2
for i in {1..10000};do
{
mysql -BNe "INSTALL PLUGIN rewriter SONAME 'rewriter.so'; UNINSTALL PLUGIN rewriter;" &> /dev/null
}
done

---------

In my test case, there is a deadlock among LOCK_plugin, LOCK_global_system_variables, and LOCK_system_variables_hash.

(gdb) thread 92
[Switching to thread 92 (Thread 0x7fec9f4e9640 (LWP 1561376))]
#0  0x00007fee02a86a90 in __lll_lock_wait () from /lib64/libc.so.6
(gdb) f 6
#6  0x00000000033d2e32 in plugin_thdvar_init (thd=0x7fed44001060, enable_plugins=true)
    at /bigdisk/jinyou.ma/mysql-server/sql/sql_plugin.cc:2992
2992      mysql_mutex_lock(&LOCK_global_system_variables);
(gdb) f 2
#2  0x00000000049bf6d0 in native_mutex_lock (mutex=0xc804ef8)
    at /bigdisk/jinyou.ma/mysql-server/include/thr_mutex.h:94
94        return pthread_mutex_lock(mutex);
(gdb) p mutex->__data.__owner
$1 = 1549502
(gdb) thread find 1549502
Thread 63 has target id 'Thread 0x7fed69df7640 (LWP 1549502)'
Thread 92 is waiting for LOCK_global_system_variables holding by thread 63.

(gdb) thread 63
[Switching to thread 63 (Thread 0x7fed69df7640 (LWP 1549502))]
(gdb) f 6
#6  0x00000000033d2ec6 in plugin_thdvar_init (thd=0x7fed10093210, enable_plugins=true)
    at /bigdisk/jinyou.ma/mysql-server/sql/sql_plugin.cc:3002
3002        mysql_mutex_lock(&LOCK_plugin);
(gdb) f 2
#2  0x00000000049bf6d0 in native_mutex_lock (mutex=0xc84f978)
    at /bigdisk/jinyou.ma/mysql-server/include/thr_mutex.h:94
94        return pthread_mutex_lock(mutex);
(gdb) p mutex->__data.__owner
$2 = 1549165
(gdb) thread find 1549165
Thread 42 has target id 'Thread 0x7fed684c5640 (LWP 1549165)'
Thread 63 is waiting for LOCK_plugin holding by thread 42.

(gdb) thread 42
[Switching to thread 42 (Thread 0x7fed684c5640 (LWP 1549165))]
#0  0x00007fee02a86849 in __futex_abstimed_wait_common () from /lib64/libc.so.6
(gdb) f 4
#4  0x00000000033d0653 in mysql_install_plugin (thd=0x7fec6c007bc0, name=..., dl=0x7fec6c0014f8)
    at /bigdisk/jinyou.ma/mysql-server/sql/sql_plugin.cc:2291
2291      mysql_rwlock_wrlock(&LOCK_system_variables_hash);
Thread 42 is waiting for LOCK_system_variables_hash.

 
The bug has been fixed at 8.0.29.

The commit is https://github.com/mysql/mysql-server/commit/97c962d414d7f9a52930e402a86815e19664030d