MySQL Bugs: #15915: "Allocate nodeid failed" when rejoining two ndb nodes

Bug #15915	"Allocate nodeid failed" when rejoining two ndb nodes
Submitted:	21 Dec 2005 19:01	Modified:	19 Jun 2006 9:37
Reporter:	Andreas Granig	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.15	OS:	Linux (Debian Sarge)
Assigned to:		CPU Architecture:	Any

Description:
I've four ndb nodes {11,12}, {13,14} running. When I type "reboot" on node 11 and 13, the cluster stays up and running. But when they come up again and I start the ndbd on the two nodes, only the first one can join the cluster, the second segfaults.

How to repeat:
Reboot node 11 and 13 during normal operation without shutting down the ndbd. If they're up again, start the ndbd on node 11 and 13 without any command-line parameter. One of the nodes then segfaults, and the mgmd says:

Allocate nodeid (13) failed. Connection from ip 172.19.200.13. Returned error string "Id 13 already allocated by another node."
2005-12-21 17:57:37 [MgmSrvr] INFO     -- Mgmt server state: node id's  11 12 13 14 32 connected but not reserved

Killing the ndb_mgmd doesn't help, and executing "purge stale sessions" on the mgm client doesn't either ("No sessions purged").

Shutting down the cluster, then restarting everything works fine btw.

The log entries of the mgmd

Attachment: mgmd-log.txt (plain/text, text), 12.76 KiB.

The configs of the mgmd, ndbd and sql node

Attachment: configs.txt (plain/text, text), 2.35 KiB.

Oh, and the log of node 13:

Time: Wednesday 21 December 2005 - 19:34:42
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 17 received; Child exited
Error object: main.cpp
Program: ndbd
Pid: 4184
Trace: /var/lib/mysql-cluster/storage/ndb_13_trace.log.5
Version: Version 5.0.15
***EOM***

The trace file ndb_13_trace.log.5 is attached seperately.

The trace of the ndbd log of node 13 (gzip'ed)

Attachment: ndb_13_trace.log.5.gz (application/x-tar, text), 67.68 KiB.

Can you attach the trace files for the node?
Thanks

Already attached, sorry. Thanks

The core is reported in http://bugs.mysql.com/bug.php?id=17677

But why "purge stale session" and restart of ndb_mgmd does not work I dont understand.
When you say "reboot", do what do you mean then?

Regardless, a number of improvements have been made in this area (in 5.0.21) 
(http://bugs.mysql.com/bug.php?id=19930)
But, unfortuntly these improvements containted a bug, that will be fixed in next 5.0 release.
Maybe the bug in the improvement does not affect you, that's hard to say.

So you could try 5.0.21 (or wait for 5.0.22)

/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".