Bug #106953 LOCK_grant and LOCK_open can deadlock on a gtid slave
Submitted: 8 Apr 2022 6:42 Modified: 14 Jun 2022 8:27
Reporter: ting du Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Locking Severity:S3 (Non-critical)
Version:5.7 OS:Linux
Assigned to: CPU Architecture:Any

[8 Apr 2022 6:42] ting du
Description:
Two threads deadlock on slave.
One is executing `show open tables;`. It call list_open_tables() which acquires LOCK_open mutex first and then call check_table_access() which acquires LOCK_grant read lock.
The second is slave thread executing `ALTER USER 'root'@'%' PASSWORD EXPIRE NEVER;`. It call mysql_alter_user() which acquires LOCK_grant write lock and then acquires LOCK_OPEN on mysql.gtid_executed table.

How to repeat:
Execute on master in a loop:
  ALTER USER 'root'@'%' PASSWORD EXPIRE NEVER;

Execute on slave in a loop:
  show open tables;
[11 Apr 2022 2:41] ting du
this bug is same with 91548, but different scenario. You can use the same my.cnf with it, but different SQL statement.
[7 Jun 2022 9:20] MySQL Verification Team
Hello ting du,

Thank you for the report and feedback.
I tried to reproduce the issue on 5.7.38 using the conf from the referenced bug# and with the provided test case but I'm not seeing any locking issues despite trying multiple times. Please share exact version that you are seeing this issue along with any details which are missing here and in the referenced bug. Thank you.

## 5.7.38

-- setup gtid bases repl
-- 5.7

rm -rf master/
bin/mysqld --initialize-insecure --basedir=$PWD --datadir=$PWD/master --log-error-verbosity=3
bin/mysqld --no-defaults --basedir=$PWD --datadir=$PWD/master --core-file --socket=/tmp/mysql_master.sock --port=3306 --log-error=$PWD/master/log.err --gtid_mode=ON --log-slave-updates --enforce-gtid-consistency --log-bin=master-bin --server_id=1 2>&1 &

rm -rf slave/
bin/mysqld --initialize-insecure --basedir=$PWD --datadir=$PWD/slave --log-error-verbosity=3
bin/mysqld --no-defaults --basedir=$PWD --datadir=$PWD/slave --core-file --socket=/tmp/mysql_slave.sock --port=3307 --log-error=$PWD/slave/log.err --gtid_mode=ON --log-slave-updates --enforce-gtid-consistency --log-bin=slave-bin --server_id=2 --sync_master_info=1 --sync_relay_log=1 --sync_relay_log_info=1 --master_info_repository=TABLE --relay_log_info_repository=TABLE 2>&1 &

- master

 bin/mysql -uroot -S /tmp/mysql_master.sock --prompt='Master>'
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.38-log MySQL Community Server (GPL)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

Master>CREATE USER 'repl'@'localhost' IDENTIFIED BY 'slavepass';
Query OK, 0 rows affected (0.00 sec)

Master>GRANT REPLICATION SLAVE ON *.* TO 'repl'@'localhost';
Query OK, 0 rows affected (0.00 sec)

Master>
Master>CREATE USER 'repl'@'127.0.0.1' IDENTIFIED BY 'slavepass';
Query OK, 0 rows affected (0.00 sec)

Master>GRANT REPLICATION SLAVE ON *.* TO 'repl'@'127.0.0.1';
Query OK, 0 rows affected (0.01 sec)

Master>FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

Master>create database test;
Query OK, 1 row affected (0.01 sec)

Master>create user 'root'@'%' identified by '123';
Query OK, 0 rows affected (0.00 sec)

Master>grant all on *.* to 'root'@'%' with grant option;
Query OK, 0 rows affected (0.00 sec)

- slave

bin/mysql -uroot -S /tmp/mysql_slave.sock --prompt='Slave>'
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.38-log MySQL Community Server (GPL)

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

Slave>CHANGE MASTER TO
    ->      MASTER_HOST='localhost',
    ->      MASTER_PORT=3306,
    ->      MASTER_USER='repl',
    ->      MASTER_PASSWORD='slavepass',
    ->      MASTER_AUTO_POSITION = 1;
Query OK, 0 rows affected, 2 warnings (0.02 sec)

Slave>start slave; show slave status\G
Query OK, 0 rows affected (0.00 sec)

*************************** 1. row ***************************
               Slave_IO_State: Checking master version
                  Master_Host: localhost
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File:
          Read_Master_Log_Pos: 4
               Relay_Log_File: support-cluster03-relay-bin.000001
                Relay_Log_Pos: 4
        Relay_Master_Log_File:
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 0
              Relay_Log_Space: 154
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 0
                  Master_UUID:
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

-- Follow steps from report

- master

yes "ALTER USER 'root'@'localhost' PASSWORD EXPIRE NEVER;"|bin/mysql -uroot -S /tmp/mysql_master.sock
/
while true; do bin/mysql -uroot -S /tmp/mysql_master.sock -e "ALTER USER 'root'@'%' PASSWORD EXPIRE NEVER;"; done

- slave

yes "show open tables;"|bin/mysql -uroot -S /tmp/mysql_slave.sock
/
while true; do bin/mysql -uroot -S /tmp/mysql_slave.sock -e "show open tables;"; done

select * from information_schema.columns order by table_schema,table_name;

regards,
Umesh
[9 Jun 2022 2:20] ting du
From code analyze:
CREATE/DROP/ALTER USER sql will call function:
|-mysql_create_user()
  |-Partitioned_rwlock_write_guard lock(&LOCK_grant); // require LOCK_grant
  |-commit_owned_gtid_by_partial_command()
    |-commit_owned_gtids()
      |-Gtid_state::save()          
        |-Gtid_table_persistor::save 
          ...
          |-open_table()             
            |-check_if_table_exists()
              |-mysql_mutex_lock(&LOCK_open);    // require LOCK_open mutex
              |-get_cached_table_share()
              |-mysql_mutex_unlock(&LOCK_open);

`show open tables` sql call function:
|-fill_open_tables()
  |-list_open_tables
	|-table_cache_manager.lock_all_and_tdc();   // require LOCK_open mutex
	  ...
	  |-mysql_mutex_lock(&LOCK_open);
	|-check_table_access()
	  |-check_grant()
		|-LOCK_grant_read_guard lock(thd);      // require LOCK_grant read lock
		  ...
	|-table_cache_manager.unlock_all_and_tdc();
	  |-mysql_mutex_unlock(&LOCK_open);
		...

From the code, the lock order is opposite in these two queries, a deadlock occurs.
[14 Jun 2022 8:27] MySQL Verification Team
Thank you for the feedback.
I could not reproduce using the earlier test case even after multiple attempts but your code analysis makes sense hence verifying so that dev's can conclude on this.