Bug #117284 Multi trp setup hangs leading to GCP stop
Submitted: 23 Jan 17:18 Modified: 24 Jan 10:03
Reporter: Mikael Ronström Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:8.4.4 OS:Any
Assigned to: CPU Architecture:Any

[23 Jan 17:18] Mikael Ronström
Description:
A node is started, the starting node starts setting up
a multi trp, the node crashes in the middle of the setup process.
In this case a retry operation will send SWITCH_MULTI_TRP_REQ and
set the variable m_current_switch_multi_trp_node to the node id of
the failed node. This leads to that the node will not start a new
multi trp setup when it comes up again. It will start preparing
and will stop in the actual switch process thus leading to no
communication to the started node that has already been included
in the GCP protocol.

How to repeat:
Run autotest, happens a bit now and then

Suggested fix:
Add one more check in send_switch_multi_transporter
similar to the check in select_node_id_for_switch to
ensure that we don't start a switch on a node that is
currently not running.
[24 Jan 10:03] MySQL Verification Team
Thanks for the report Mikael,

all best