Bug #15915 "Allocate nodeid failed" when rejoining two ndb nodes
Submitted: 21 Dec 2005 19:01 Modified: 19 Jun 2006 9:37
Reporter: Andreas Granig Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0.15 OS:Linux (Debian Sarge)
Assigned to: CPU Architecture:Any

[21 Dec 2005 19:01] Andreas Granig
Description:
I've four ndb nodes {11,12}, {13,14} running. When I type "reboot" on node 11 and 13, the cluster stays up and running. But when they come up again and I start the ndbd on the two nodes, only the first one can join the cluster, the second segfaults.

How to repeat:
Reboot node 11 and 13 during normal operation without shutting down the ndbd. If they're up again, start the ndbd on node 11 and 13 without any command-line parameter. One of the nodes then segfaults, and the mgmd says:

Allocate nodeid (13) failed. Connection from ip 172.19.200.13. Returned error string "Id 13 already allocated by another node."
2005-12-21 17:57:37 [MgmSrvr] INFO     -- Mgmt server state: node id's  11 12 13 14 32 connected but not reserved

Killing the ndb_mgmd doesn't help, and executing "purge stale sessions" on the mgm client doesn't either ("No sessions purged").

Shutting down the cluster, then restarting everything works fine btw.
[21 Dec 2005 19:03] Andreas Granig
The log entries of the mgmd

Attachment: mgmd-log.txt (plain/text, text), 12.76 KiB.

[21 Dec 2005 19:04] Andreas Granig
The configs of the mgmd, ndbd and sql node

Attachment: configs.txt (plain/text, text), 2.35 KiB.

[21 Dec 2005 19:07] Andreas Granig
Oh, and the log of node 13:

Time: Wednesday 21 December 2005 - 19:34:42
Status: Temporary error, restart node
Message: Error OS signal received (Internal error, programming error or missing error message, please report a bug)
Error: 6000
Error data: Signal 17 received; Child exited
Error object: main.cpp
Program: ndbd
Pid: 4184
Trace: /var/lib/mysql-cluster/storage/ndb_13_trace.log.5
Version: Version 5.0.15
***EOM***

The trace file ndb_13_trace.log.5 is attached seperately.
[21 Dec 2005 19:10] Andreas Granig
The trace of the ndbd log of node 13 (gzip'ed)

Attachment: ndb_13_trace.log.5.gz (application/x-tar, text), 67.68 KiB.

[21 Dec 2005 19:20] Jonathan Miller
Can you attach the trace files for the node?
Thanks
[21 Dec 2005 19:21] Jonathan Miller
Already attached, sorry. Thanks
[19 May 2006 9:37] Jonas Oreland
The core is reported in http://bugs.mysql.com/bug.php?id=17677

But why "purge stale session" and restart of ndb_mgmd does not work I dont understand.
When you say "reboot", do what do you mean then?

Regardless, a number of improvements have been made in this area (in 5.0.21) 
(http://bugs.mysql.com/bug.php?id=19930)
But, unfortuntly these improvements containted a bug, that will be fixed in next 5.0 release.
Maybe the bug in the improvement does not affect you, that's hard to say.

So you could try 5.0.21 (or wait for 5.0.22)

/Jonas
[19 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".