Description:
I am upgrading my mysql-cluster installation to version 7.1.10 from a previous release. During testing, I've noticed that ndbd nodes often fail to rejoin the cluster after a server reboot. The cluster logs indicate that the management server has not freed the nodeid associated with the rebooted server. The mgmt server then rejects the joining ndbd node with an "ID already allocated by another node" error. After rejecting the ndbd startup, the mgmt server then frees the node Id after about 15-20 seconds. I can then restart ndbd manually with no problems.
I've tried adding a "ndb_mgm -e purge stale sessions" command to my ndbd startup script with mixed results. Sometimes I can influence the mgmt server to release the node; sometimes I can't.
I've found I can work around the problem by secifying "no-nodeid-checks" at management server startup. I am running with dual management servers and use explicit node-id values in my cluster config. Is there any downside to running with "no-nodeid-checks"?
Regards
How to repeat:
Reboot server with running ndbd node