MySQL Bugs: #33012: Cluster refused allocation

Bug #33012	Cluster refused allocation - Node type mismatch
Submitted:	5 Dec 2007 19:21	Modified:	5 Jan 2008 10:09
Reporter:	Matthew Boehm	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.1.22-rc	OS:	Linux (FC - 2.6.23.1-49.fc8 x86_64)
Assigned to:		CPU Architecture:	Any
Tags:	allocation, application error, cluster, mismatch, node

Description:
5 server install of MySQL Cluster. 2 servers acting as mysqld (referred to as 'first' and 'second' below), 2 as nodes, 1 as manager and mysqld (referred to as 'third').

Nodes connect fine to manager.
mysqld's first & second connect fine to cluster.
mysqld on same machine as manager (third) refuses to connect.

ndb_* apps on any of the 5 machines refuse to connect. All giving this error in cluster log:

2007-12-05 11:53:46 [MgmSrvr] WARNING  -- Cluster refused allocation of id 8. Connection from ip 192.168.1.14. Returned error string "Cluster refused allocation of id 8. Error: 1704 (Node type mismatch: Permanent error: Application error)."

Will attach all configs.

How to repeat:
Use attached configs. Start manager. Start both nodes. Start first (10) mysqld. Start second (11) mysql. All is fine up to here. Attempt to start third(8) and failure to connect.

Suggested fix:
No fix known.

cluster manager config

Attachment: ndbcluster.ini (application/octet-stream, text), 880 bytes.

'first' mysqld config

Attachment: first_mysqld_my_cnf.txt (text/plain), 1.98 KiB.

'second' mysqld config

Attachment: second_mysqld_my_cnf.txt (text/plain), 1.98 KiB.

'third' mysqld config - the failure

Attachment: third_mysqld_my_cnf.txt (text/plain), 1.95 KiB.

From manager machine:

[root@shattrath /]# ndb_show_tables -c nodeid=8,localhost

<connect timeout>

In ndbcluster.log:
  2007-12-05 13:24:41 [MgmSrvr] WARNING  -- Cluster refused allocation of id 8. Connection from ip 192.168.1.14. Returned error string "Cluster refused allocation of id 8. Error: 1704 (Node type mismatch: Permanent error: Application error)."

First I would change in the my.cnf 

ndb-connectstring	= nodeid=11,192.168.1.14

to

ndb-connectstring	= 192.168.1.14:<port number of mgm node>

Second,

You should query the version of the MySQLD to make sure you are starting the mysqld executable you think you are starting. Sounds like you may have an older version on the host.

Best wishes,
/Jeb

All versions are exactly the same. Downloaded the tar.gz source once and transfered it to all five machines with a thumb drive. Had to because none of these machines have public internet access.

With regard to specifying the port, I'll give it a try.

What does the "ndb_mgm -e show" output look like?

You'll probably see in there that the two connected
mysqld nodes are on id 8 and 9, in that case either:

- move the dedicated [MYSQLD] slots to the top of the
  list so that they are searched first

- use "nodeid=...;host=management_server" as connect
  string to specify the [MYSQLD] slot to use for each
  server

See also http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-connectstring.html

I am already specifying the slots to use for 10 & 11, as evidenced below:

[mboehm@shattrath ~]$ ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=12   @192.168.1.12  (Version: 5.1.22, Nodegroup: 0, Master)
id=13   @192.168.1.13  (Version: 5.1.22, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=14   @192.168.1.14  (Version: 5.1.22)

[mysqld(API)]   4 node(s)
id=8 (not connected, accepting connect from any host)
id=9 (not connected, accepting connect from any host)
id=10   @192.168.1.10  (Version: 5.1.22)
id=11   @192.168.1.11  (Version: 5.1.22)

Attempting to use your connection string, from any of the 5 servers, results in the same behavior:

--------------
[root@shattrath mysql]# ndb_show_tables -c nodeid=8;host=shattrath:1186

2007-12-06 08:26:09 [MgmSrvr] WARNING  -- Cluster refused allocation of id 8. Connection from ip 127.0.0.1. Returned error string "Cluster refused allocation of id 8. Error: 1704 (Node type mismatch: Permanent error: Application error)."
--------------

Even worse. I changed my cluster config to remove all reserved mysqlds:

[mysqld]
[mysqld]
[mysqld]
[mysqld]

[ndbd]
Id              = 12
HostName        = exodar

[ndbd]
Id              = 13
HostName        = darnassus

[ndb_mgmd]
Id              = 14
HostName        = shattrath

And now, NOTHING will connect! I get that same error of node type mismatch when I try and connect either ndb_* app or mysqld process.

Ok. Follow this carefully. After the settings in my most recent post failed, I went back to the cluster config and made it look like this:

[mysqld]
[mysqld]

[mysqld]
Id              = 10
HostName        = stormwind

[mysqld]
Id              = 11
HostName        = ironforge

[ndbd]
Id              = 12
HostName        = exodar

[ndbd]
Id              = 13
HostName        = darnassus

[ndb_mgmd]
Id              = 14
HostName        = shattrath

Then I went to mysqld server id 10 and attempted to start it. I specified NO node id in that servers my.cnf. manager crashed with the following error:

ndb_mgmd: MgmtSrvr.cpp:2245: bool MgmtSrvr::alloc_node_id(NodeId*, ndb_mgm_node_type, sockaddr*, socklen_t*, int&, BaseString&, int):
 Assertion `id_found == 0' failed.

Crazy things going on here.

Ok. So here is what I have found out and I amazingly apologize if this is a "duh". My original ndbcluster.ini only had 2 [mysqld] sections when I started the cluster. I then added a 3rd section. I only restarted the manager. I did not do a rolling-restart of the entire cluster. I continued to mangle/mess with all the config files (my.cnf, ndbcluster.ini) all along only restarting the manager. I got to a point, above, where absolutely nothing would connect to the cluster. So I restarted the whole thing.

Low and behold, everything connects now; including the ndb_* apps and my 3rd mysqld.

Not sure if I remember reading anything about having to restart the entire cluster if changes to manager happen but that seems to make sense.

If anything comes from this, its probably 'RTFM' on my part but hopefully a better error message can be put in the code.