MySQL Bugs: #117284: Multi trp setup hangs leading to GCP stop

Bug #117284	Multi trp setup hangs leading to GCP stop
Submitted:	23 Jan 17:18	Modified:	24 Jan 10:03
Reporter:	Mikael Ronström	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	8.4.4	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
A node is started, the starting node starts setting up
a multi trp, the node crashes in the middle of the setup process.
In this case a retry operation will send SWITCH_MULTI_TRP_REQ and
set the variable m_current_switch_multi_trp_node to the node id of
the failed node. This leads to that the node will not start a new
multi trp setup when it comes up again. It will start preparing
and will stop in the actual switch process thus leading to no
communication to the started node that has already been included
in the GCP protocol.

How to repeat:
Run autotest, happens a bit now and then

Suggested fix:
Add one more check in send_switch_multi_transporter
similar to the check in select_node_id_for_switch to
ensure that we don't start a switch on a node that is
currently not running.

Thanks for the report Mikael,

all best