Bug #8293 ndb_mgmd does not recognize available node ids after node crash
Submitted: 3 Feb 2005 16:11 Modified: 3 Feb 2005 19:10
Reporter: Jörg Nowak Email Updates:
Status: To be fixed later
Category:Server: Cluster Severity:S3 (Non-critical)
Version:4.1.9 OS:Linux (Suse 9.1 64 bit Version)
Assigned to: Jonas Oreland Target Version:

[3 Feb 2005 16:11] Jörg Nowak
Description:
Bug in ndb_mgmd:

Scenario: cluster with 2 computers (2 database nodes on each computer, No. of replica:
2).
computer 1: Node 2 (master), 4 
computer 2: Node 3, 5 + Management node

I reboot computer 1 (node 2 and 4 crashed). After rebooting I tried to restart node 2 and
4 (outside of ndb_mgm).
An error occurs in the Log  file: "Nodeid 2 is allocated by another node". I'm not able
to restart the crashed notes.
When I kill the ndb_mgmd and restart it then I'm able to restart the crahed notes.

It seems that ndb_mgmd occupies all node ids (the crashed ids too). This looks like a bug
for me.

 

How to repeat:
reboot one of 2 database computer in the cluster and try to restart the crashed nodes.

Suggested fix:
ndb_mgmd should not occupy crashed (inactive) id's.
[3 Feb 2005 19:10] Jonas Oreland
Hi,

This situation can occur in some situations.
There are however some work arounds.
1) instead of restarting ndb_mgmd, you can issue the command "purge stale sessions"

2) You can by pass the problem totally by specifying node id in each node's connectstring
& start "ndb_mgmd --no-nodeid-checks"

/Jonas

ps. we're also working on a "real" fix. But I dare not guess when that will be ready. ds