MySQL Bugs: #61461: ndbd not rejoining cluster after boot - Id already allocated by another node

Bug #61461	ndbd not rejoining cluster after boot - Id already allocated by another node
Submitted:	9 Jun 2011 13:35	Modified:	30 Jun 2011 20:43
Reporter:	Tim Heath	Email Updates:
Status:	Analyzing	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.1.10	OS:	Linux (redhat 5.6)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
I am upgrading my mysql-cluster installation to version 7.1.10 from a previous release. During testing, I've noticed that ndbd nodes often fail to rejoin the cluster after a server reboot. The cluster logs indicate that the management server has not freed the nodeid associated with the rebooted server. The mgmt server then rejects the joining ndbd node with an "ID already allocated by another node" error. After rejecting the ndbd startup, the mgmt server then frees the node Id after about 15-20 seconds. I can then restart ndbd manually with no problems.

I've tried adding a "ndb_mgm -e purge stale sessions" command to my ndbd startup script with mixed results. Sometimes I can influence the mgmt server to release the node; sometimes I can't.

I've found I can work around the problem by secifying "no-nodeid-checks" at management server startup. I am running with dual management servers and use explicit node-id values in my cluster config. Is there any downside to running with "no-nodeid-checks"?

Regards

How to repeat:
Reboot server with running ndbd node