MySQL Bugs: #116512: Force quit in innodb cluster

Bug #116512	Force quit in innodb cluster
Submitted:	30 Oct 2024 20:15	Modified:	28 Nov 2024 13:29
Reporter:	CunDi Fang	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	Shell AdminAPI InnoDB Cluster / ReplicaSet	Severity:	S2 (Serious)
Version:	8.0.35-innodb cluster	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
In a MySQL Group Replication cluster, a node attempting to join the cluster fails to connect to a peer node and subsequently encounters a data type conversion error while executing a transaction, causing the node to forcefully exit the cluster.

How to repeat:
Here is the log:
```
2024-10-30T07:21:26.634540Z 15 [ERROR] [MY-013146] [Repl] Replica SQL for channel 'group_replication_applier': Worker 1 failed executing transaction 'e6387c80-95c4-11ef-ac8b-0242c0050a08:7851'; Column 2 of table 'mytest103.test1' cannot be converted from type 'float' to type 'tinyint(1)', Error_code: MY-013146
2024-10-30T07:21:26.634920Z 14 [Warning] [MY-010584] [Repl] Replica SQL for channel 'group_replication_applier': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: MY-001756
2024-10-30T07:21:26.634975Z 14 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2024-10-30T07:21:26.635126Z 12 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
```

After the connection issue, the system reported a data type incompatibility error (Error_code: MY-013146) encountered during the execution of transaction e6387c80-95c4-11ef-ac8b-0242c0050a08:7851, which means that the type of float in column 2 of table mytest103.test1 could not be converted to tinyint(1). This conversion error caused the transaction to not be applied to the node correctly. This error triggered a halt of the coordinator and worker threads (Replica SQL for channel 'group_replication_applier': ... The replica coordinator and worker threads are stopped) and poses a potential risk of data inconsistency.

The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group). member will now leave the group). Subsequently, the logs show a Fatal error during execution on the Applier process of Group Replication and the system is forced to exit the group.

Suggested fix:
This issue causes the node to exit the cluster and enter a read-only state, which affects the high availability and consistency of the Group Replication. It is recommended to check the network configuration of the Group Replication and make sure that the field types of the data tables match to avoid replication failures caused by similar conversion errors.

Hi,

> which means that the type of float in column 2 of table mytest103.test1 could not be converted to tinyint(1).

How did this happen? MySQL did not alter mytest103.test1 table on it's own. What exactly did you do for this fail to occur.

Thanks

Bug #116511 marked as duplicate of this one

Sorry, I lost the record of this test, I can describe the test scenario, I hope it can help you: 

I was in this 5 nodes multi-master innodb cluster, in two of the nodes constantly perform sql execution operations, including ddl statements. Then on the master node, I was performing cluster state modification operations, such as restarting nodes. 

While constantly performing such operations, I ran into this problem. I will continue to test this and will continue to escalate if I find a similar situation in the future.

Sorry, I lost the record of this test, I can describe the test scenario, I hope it can help you: 

I was in this 5 nodes multi-master innodb cluster, in two of the nodes constantly perform sql execution operations, including ddl statements. Then on the master node, I was performing cluster state modification operations, such as restarting nodes. 

While constantly performing such operations, I ran into this problem. I will continue to test this and will continue to escalate if I find a similar situation in the future.

Hi,

I am not able to reproduce this with 8.0.40 and without full logs I can't do much with it. Please reopen if you manage to reproduce this with latest MySQL