Bug #61461 ndbd not rejoining cluster after boot - Id already allocated by another node
Submitted: 9 Jun 2011 13:35 Modified: 30 Jun 2011 20:43
Reporter: Tim Heath Email Updates:
Status: Analyzing Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.1.10 OS:Linux (redhat 5.6)
Assigned to: Assigned Account CPU Architecture:Any

[9 Jun 2011 13:35] Tim Heath
I am upgrading my mysql-cluster installation to version 7.1.10 from a previous release. During testing, I've noticed that ndbd nodes often fail to rejoin the cluster after a server reboot. The cluster logs indicate that the management server has not freed the nodeid associated with the rebooted server. The mgmt server then rejects the joining ndbd node with an "ID already allocated by another node" error. After rejecting the ndbd startup, the mgmt server then frees the node Id after about 15-20 seconds. I can then restart ndbd manually with no problems.

I've tried adding a "ndb_mgm -e purge stale sessions" command to my ndbd startup script with mixed results. Sometimes I can influence the mgmt server to release the node; sometimes I can't.

I've found I can work around the problem by secifying "no-nodeid-checks" at management server startup. I am running with dual management servers and use explicit node-id values in my cluster config. Is there any downside to running with "no-nodeid-checks"?


How to repeat:
Reboot server with running ndbd node