Description:
A 4 nodes MGR cluster run with single-master mode.
node1: secondary node
node2: secondary node
node3: primary node
node4: secondary node
During the benchmark test, execute the kill -9 mysqld process on the primary node and any secondary node.
At this time, the MGR cluster cannot write new data. The two nodes that have been killed need to be started, and the innodb crash recovery should be completed first.
Check the transaction GTID progress of each MGR node with the SQL command:
```
select RECEIVED_TRANSACTION_SET from performance_schema.replication_connection_status union all select variable_value from performance_schema.global_variables where variable_name = 'gtid_executed';
```
node1:
```
| gtid_executed | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092789 |
| RECEIVED_TRANSACTION_SET | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092789 |
```
node2:(had been killed by kill -9 mysqld pid)
```
| gtid_executed | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092789 |
| RECEIVED_TRANSACTION_SET | f195537d-19ac-11eb-b29f-5254002eb6d6:1-300917674:300917679-301092789 |
```
node3:(original primary node, had been killed by kill -9 mysqld pid)
```
| gtid_executed | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092699:301092701-301092723:301092725-301092743:301092746:301092748-301092749 |
| RECEIVED_TRANSACTION_SET | f195537d-19ac-11eb-b29f-5254002eb6d6:1-300917672:300917678-300917680 |
```
node4:
| gtid_executed | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092789 |
| RECEIVED_TRANSACTION_SET | f195537d-19ac-11eb-b29f-5254002eb6d6:1-301092789 |
```
We choose node1 or node4 as the new primary node and start the mgr cluster again.
After node3 (the original primary node) completes mgr recovery, it is found that its data is inconsistent with other nodes.
The gtid range of the node3 node that needs to restore data is 301092749-301092789, but checking the binlog found that some of the transactions in it are updating data on other nodes, but the same gtid transaction is executed during the node3 recovery period is the view change event, so it leads to Inconsistent data, for example:
one gtid event on node4
```
19679267 SET @@SESSION.GTID_NEXT= 'f195537d-19ac-11eb-b29f-5254002eb6d6:304451797'/*!*/;
19679268 # at 511622795
19679269 #210430 16:22:25 server id 3410 end_log_pos 511622867 CRC32 0x4c32e87d Query thread_id=3123 exec_time=019679269 error_code=0
19679270 SET TIMESTAMP=1619770945/*!*/;
19679271 BEGIN
19679272 /*!*/;
19679273 # at 511622867
19679274 #210430 16:22:25 server id 3410 end_log_pos 511622935 CRC32 0x9c37af89 Table_map: `sbtest`.`sbtest10` mapp19679274 ed to number 98
19679275 # at 511622935
19679276 #210430 16:22:25 server id 3410 end_log_pos 511623351 CRC32 0xfa505337 Update_rows: table id 98 flags: STM19679276 T_END_F
19679277 ### UPDATE `sbtest`.`sbtest10`
19679278 ### WHERE
19679279 ### @1=9136
```
the same gtid was view change event on node3(found in the recovery binlog)
```
SET @@SESSION.GTID_NEXT= 'f195537d-19ac-11eb-b29f-5254002eb6d6:304451797'/*!*/;
# at 1097
#210430 16:26:22 server id 3410 end_log_pos 1163 CRC32 0x85b57863 Query thread_id=14 exec_time=19 error_code=0
SET TIMESTAMP=1619771182/*!*/;
BEGIN
/*!*/;
# at 1163
#210430 16:26:22 server id 3410 end_log_pos 1514 CRC32 0x48a5a849 View_change_log_event: view_id=16197711839237518:2
# at 1514
#210430 16:26:22 server id 3410 end_log_pos 1586 CRC32 0xf7d30016 Query thread_id=14 exec_time=19 error_code=0
SET TIMESTAMP=1619771182/*!*/;
COMMIT
```
This leads to inconsistent data among several nodes after MGR recovery.
How to repeat:
1. Start a 4 nodes MGR cluster, assuming node3 is the primary node, and the rest are secondary nodes
2. Start sysbench pressure test
3. After about 5 minutes, execute kill -9 on the nodes of node3 and node2 (first kill the primary node, then kill the secondary node, and the speed should be fast enough)
4. At this time, the mgr cluster cannot write data, and perform ```stop group_replication``` on node1 and node4
5. Start node2 and node3 and execute them on the 4 nodes:
```
select RECEIVED_TRANSACTION_SET from performance_schema.replication_connection_status union all select variable_value from performance_schema.global_variables where variable_name ='gtid_executed';
```
6. Confirm which node has more gtid and choose it as the new primary node. Generally, node1 and node4 are the same, you can choose one as the new primary node
7. Restart the mgr cluster, start node2 and node3, and execute ```start group_replication```, at this time node2 and node3 perform mgr recovery
8. When node3 completes mgr recovery, it will find that its data is inconsistent with node4
9. Check and confirm the binlog generated during mgr recovery on node3, and you will find that the event of the same gtid is a DML operation on node4, while it is a view change event on node3, which leads to inconsistent data
Suggested fix:
When performing mgr recovery, it is necessary to determine whether the transaction corresponding to the gtid of each node is the same transaction, and the view change event cannot occupy the normal transaction gtid on other nodes.