Bug #119011 Execute some acl statements to a cluster without quorum in group replication makes login stuck
Submitted: 15 Sep 16:04 Modified: 23 Sep 16:08
Reporter: toki ye (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:9.4.0 OS:Any
Assigned to: CPU Architecture:Any

[15 Sep 16:04] toki ye
Description:
  Execute some acl statements to a no quorum cluster in group replication make login process stuck for all new connections.

  1. Group Replication member without the acknowledges of other quorums, will  block and wait transaction to commit when execute acl statements like "CREATE USER ...".

  2. During this waiting time, an acl-cache write lock is acquiring for "CREATE USER ..." statement and could not release for a long time unless commit finishes.

  3. When new connection comes later, the server try to acquire acl-cahce lock to make identity authentication.

  4. Because Group Communication Service do not release acl-cache write lock for a long time, loggin process stucks.

---
  I alse try to find some familiar situations, one of the most similar is bug #330252231: SHELL HANGS FOR 2H IN ADDINSTANCE() (GR+CLONE). add instance process hangs for acl-cache lock because of executing acl statements when a new member adds.

  However, it has been solved since MySQL8.0.27.

  If there is any thing about this bug, i would be glad listen and hear about.

How to repeat:
  the description above can be repeated in the newest MySQL8.0 version by  the fllowing steps:

  1. Deploy a group replication with 3 members.

  2. Kill any two of them(in case of network disconnection), the only member will log like:'This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to force a new group membership.'

  3. Execute "CREATE USER 'test'@'localhost' identified by 'uikK12*123...';" statement to the final member.

  4. Start to login in the final member with a new connection, then you will find login process stucks.
[22 Sep 14:45] MySQL Verification Team
Hi,

I tested this with 9.4.0 and could not reproduce. What versions did you have this problem with?

Thanks
[23 Sep 14:05] toki ye
Hi, thanks for your reply. I can reproduce in MySQL 9.4.0(Debain) which is downloaded from https://dev.mysql.com/.Here are steps that I do:

1. Deploy group replication 3 members in MySQL9.4.0:
    mysql 127.0.0.1:3310> select * from performance_schema.replication_group_members\G;
    *************************** 1. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 3d212dd1-9836-11f0-ad7f-525400a7d3bf
               MEMBER_HOST: 127.0.0.1
               MEMBER_PORT: 3309
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 9.4.0
            MEMBER_COMMUNICATION_STACK: MySQL
    *************************** 2. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 6b3a4b0b-986a-11f0-a48e-525400a7d3bf
               MEMBER_HOST: 127.0.0.1
               MEMBER_PORT: 3310
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 9.4.0
            MEMBER_COMMUNICATION_STACK: MySQL
    *************************** 3. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: b4870b16-986e-11f0-a456-525400a7d3bf
               MEMBER_HOST: 127.0.0.1
               MEMBER_PORT: 3311
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 9.4.0
            MEMBER_COMMUNICATION_STACK: MySQL
    3 rows in set (0.000 sec)

    ERROR:
    No query specified

2. Kill 2 secondary members, simulating some unexpected cases like network disconnection, secondary member crashes and so on.
    root@VM-20-15-debian:/home/mysql/server/3309# ps aux | grep mysqld
    root     2605896  0.4  8.6 2050988 174996 pts/11 Sl   19:04   0:38 mysqld-debug --defaults-file=/home/mysql/server/3310/my.cnf
    root     2646966  1.1 32.4 2061884 653472 pts/1  Sl   21:14   0:02 mysqld-debug --defaults-file=/home/mysql/server/3311/my.cnf
    root     2647306  1.4 30.3 2061884 611364 pts/10 Sl   21:15   0:02 mysqld-debug --defaults-file=/home/mysql/server/3309/my.cnf
    root     2648151  0.0  0.1   6332  2156 pts/10   S+   21:17   0:00 grep mysqld

    root@VM-20-15-debian:/home/mysql/server/3309# kill -9 2646966
    root@VM-20-15-debian:/home/mysql/server/3311# kill -9 2647306

3. After step 2, the replication_group_members becomes only the primary member online.
    mysql 127.0.0.1:3310> select * from performance_schema.replication_group_members\G;
    *************************** 1. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 3d212dd1-9836-11f0-ad7f-525400a7d3bf
               MEMBER_HOST: 127.0.0.1
               MEMBER_PORT: 3309
              MEMBER_STATE: UNREACHABLE
               MEMBER_ROLE: SECONDARY
            MEMBER_VERSION: 9.4.0
            MEMBER_COMMUNICATION_STACK: MySQL
    *************************** 2. row ***************************
              CHANNEL_NAME: group_replication_applier
                 MEMBER_ID: 6b3a4b0b-986a-11f0-a48e-525400a7d3bf
               MEMBER_HOST: 127.0.0.1
               MEMBER_PORT: 3310
              MEMBER_STATE: ONLINE
               MEMBER_ROLE: PRIMARY
            MEMBER_VERSION: 9.4.0
            MEMBER_COMMUNICATION_STACK: MySQL
    2 rows in set (0.002 sec)

    ERROR:
    No query specified
    
4. Execute CREATE USER statement to the only online primary member. And the statment block for waiting acknowledges of other missing quorums to commit.
    mysql 127.0.0.1:3310> CREATE USER `group_replication_test`@localhost IDENTIFIED BY 'uikK12*123...';

5. Try to login the only online primary member, then find login process stuck
    root@VM-20-15-debian:/home/mysql/server/3309# mysql -u root -h 127.0.0.1 -P 3310 -p
    Enter password:

6. If there is another connection already logins in, you can execute SHOW PROCESSLIST statement, to find more details about the login stucks:
    *************************** 3. row ***************************
         Id: 18
       User: system user
       Host:
         db: NULL
    Command: Connect
       Time: 9018
      State: waiting for handler commit
       Info: Group replication applier module
    *************************** 13. row ***************************
         Id: 114
       User: root
       Host: localhost:59002
         db: NULL
    Command: Query
       Time: 647
      State: waiting for handler commit
       Info: CREATE USER 'group_replication_test'@'localhost' IDENTIFIED BY <secret>
    *************************** 14. row ***************************
         Id: 115
       User: unauthenticated user
       Host: localhost:36616
         db: NULL
    Command: Connect
       Time: 248
      State: Waiting for acl cache lock
       Info: NULL
    *************************** 15. row ***************************
         Id: 116
       User: unauthenticated user
       Host: localhost:59464
         db: NULL
    Command: Connect
       Time: 33
      State: Waiting for acl cache lock
       Info: NULL
    15 rows in set, 1 warning (0.000 sec)
[23 Sep 16:08] MySQL Verification Team
Thanks for the clear instructions. I did manage to reproduce the problem.