Bug #104117 MGR all second node abnormal stop
Submitted: 26 Jun 2021 2:36 Modified: 26 Jun 2021 2:39
Reporter: yongliang lee Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:mysql8.0.23 mysql8.0.24 OS:Any
Assigned to: CPU Architecture:Any

[26 Jun 2021 2:36] yongliang lee
Description:
Hi all:
   our MGR encounter an case,two second node is stop only primary ok.then we restart two second node's group_replication,MGR all node status is recovery and online,second node error log:
   
2021-06-25T22:23:27.978680-04:00 12 [ERROR] [MY-010584] [Repl] Slave SQL for channel 'group_replication_applier': Error 'You are not allowed to create a user with GRANT' on query. Default database: 'performance_schema'. Query: 'GRANT SELECT, INSERT, UPDATE, DELETE ON `gr_opt`.* TO 'testuser'@'192.168.1.100'', Error_code: MY-001410
2021-06-25T22:23:27.978759-04:00 12 [Warning] [MY-010584] [Repl] Slave: You are not allowed to create a user with GRANT Error_code: MY-001410
2021-06-25T22:23:27.978786-04:00 12 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2021-06-25T22:23:27.978891-04:00 12 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0
2021-06-25T22:23:27.978987-04:00 9 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2021-06-25T22:23:27.979120-04:00 9 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2021-06-25T22:23:31.773679-04:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
   

This case is update mysql.user table (see repeat detail).

I know not recommend update mysql.user table use DML statement use drop/create user to replace.
But in Innodb cluster high load stop/start group replication is heavy operation , update mysql.user direct may cause all second node stop.

Grant privileges,find_acl_user funcation search user from cache Acl_user_ptr_list, update mysql.user table not update this cache,when error encounter,then load cache again,so frist grant is fail,then grant is success.But second node applier channel sql thread apply this statment will report an error and stop group replication. 

Can we add DML verifiy in group_replication_trans_before_dml callback function to verify like this system table DML operation,primary node not allow update mysql.user table(or other privilegs tables)? i think this is better way to solve this issue.

thanks.

How to repeat:
MGR single primary mode

1.inital

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 5435c165-728d-11eb-ae01-000c2956fa4f | mgr5        |        3329 | ONLINE       | SECONDARY   | 8.0.23         |
| group_replication_applier | c32955c5-7d5d-11eb-865d-000c29948de2 | mgr4        |        3329 | ONLINE       | PRIMARY     | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
2 rows in set (0.00 sec)

2.primary node:

create user testuser identified with mysql_native_password by 'TT123t$';
update mysql.user set host='192.168.1.100' where user='testuser' ;
grant select,update,insert,delete on gr_opt.* to testuser@'192.168.1.100';(error,use cache verify user then reload cache from mysql.user)
grant select,update,insert,delete on gr_opt.* to testuser@'192.168.1.100';(normal,cache reload from mysql.user)

3.all second node is stop.

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | c32955c5-7d5d-11eb-865d-000c29948de2 | mgr4        |        3329 | ONLINE       | PRIMARY     | 8.0.23         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
1 row in set (0.00 sec)

Suggested fix:
Can we add DML verifiy in group_replication_trans_before_dml callback function to verify like this system table DML operation,primary node not allow update mysql.user table(or other privilegs tables)? i think this is better way to solve this issue.
[26 Jun 2021 2:39] yongliang lee
Sorry ignore this issue .i will use other account to submit !!