MySQL Bugs: #64158: No free node id found for mysqld(API).

Bug #64158	No free node id found for mysqld(API).
Submitted:	29 Jan 2012 19:28	Modified:	30 Aug 2016 23:08
Reporter:	Michael Prokopiv	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	MySQL-Cluster-gpl-management-7.1.18-1.el	OS:	Linux (Oracle Linux Release 6 Update 2 Boot iso image for x86_64 (64 bit))
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	already allocated, free node id

Description:
Can't connect SQL node to the MGM node, 
ndb_mgm> show 
Cluster Configuration 
--------------------- 
[ndbd(NDB)] 2 node(s) 
id=2 (not connected, accepting connect from 192.168.92.31) 
id=3 (not connected, accepting connect from 192.168.92.32) 

[ndb_mgmd(MGM)] 1 node(s) 
id=1 @192.168.92.11 (mysql-5.1.56 ndb-7.1.18) 

[mysqld(API)] 2 node(s) 
id=4 (not connected, accepting connect from 192.168.92.21) 
id=5 (not connected, accepting connect from any host) 

at MGM node getting error every ~5 seconds: 
[MgmtSrvr] WARNING -- Failed to allocate nodeid for API at 192.168.92.21. Returned eror: 'No free node id found for mysqld(API).' 

"--verbose" key for ndb_mgmd gave some more details into log file: 
2012-01-24 01:22:50 [MgmtSrvr] DEBUG -- Trying to allocate nodeid for 192.168.92.21(nodeid: 0, type: API) 
2012-01-24 01:22:50 [MgmtSrvr] DEBUG -- [0]: 4, '192.168.92.21', 1 
2012-01-24 01:22:50 [MgmtSrvr] DEBUG -- [1]: 5, '', 0 
2012-01-24 01:22:50 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 4... 
2012-01-24 01:22:53 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 4... - Repeated 23 times 
2012-01-24 01:22:53 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 4... - Repeated 5 times 
2012-01-24 01:22:53 [MgmtSrvr] DEBUG -- Failed to allocate nodeid 4 for API node in cluster (retried during 3074 milliseconds) 
2012-01-24 01:22:53 [MgmtSrvr] DEBUG -- Releasing local reservation for nodeid 4 
2012-01-24 01:22:53 [MgmtSrvr] WARNING -- Failed to allocate nodeid for API at 192.168.92.21. Returned eror: 'No free node id found for mysqld(API).'

OS: Oracle Linux Release 6 Update 2 Boot iso image for x86_64 (64 bit) 
MGM installed: 
MySQL-Cluster-gpl-extra-7.1.18-1.el6.x86_64.rpm 
MySQL-Cluster-gpl-management-7.1.18-1.el6.x86_64.rpm 
MySQL-Cluster-gpl-tools-7.1.18-1.el6.x86_64.rpm 

file {config.ini} 
[ndbd default] 
NoOfReplicas=2 
DataMemory=80M 
IndexMemory=18M 

[tcp default] 
portnumber=2202 

[ndb_mgmd] 
HostName=192.168.92.11 
datadir=/var/lib/mysql 

[ndbd] 
HostName=192.168.92.31 
datadir=/var/lib/mysql/data 

[ndbd] 
HostName=192.168.92.32 
datadir=/var/lib/mysql/data 

[mysqld] 
HostName=192.168.92.21 

[mysqld] 

API node installed: 
MySQL-Cluster-gpl-client-7.1.18-1.el6.x86_64.rpm 
MySQL-Cluster-gpl-server-7.1.18-1.el6.x86_64.rpm 

{my.cnf} 
[mysqld] 
ndbcluster 
ndb-connectstring=192.168.92.11 
datadir=/var/lib/mysql 
socket=/var/lib/mysql/mysql.sock 
user=mysql 
# Disabling symbolic-links is recommended to prevent assorted security risks 
#symbolic-links=0 

[mysqld_safe] 
log-error=/var/log/mysqld.log 
pid-file=/var/run/mysqld/mysqld.pid 

[mysql cluster] 
ndb-connectstring=192.168.92.11 

How to repeat:
Just connect two servers and apply configuration files from description.

you need to start your data-nodes.
the node id allocation is performed there...

/Jonas

The real problem is in the difference between starting server from command line ("mysqld_safe&") and as system service ("/etc/init.d/mysql start"): 
http://forums.mysql.com/read.php?25,295364,295404#msg-295404

SQL node init file

Attachment: mysql (text/plain), 11.95 KiB.

Thank you for the feedback.

Where is located that cnf file which does not read mysqld server started using mysql script?

/etc/my.cnf

Compare between /etc/init.d/mysqld files:
--- mysql.orig    2012-02-02 01:22:15.405952008 +0400
+++ mysql    2012-02-02 01:24:34.856156547 +0400
@@ -106,7 +106,7 @@
 
 mode=$1    # start or stop
 shift
-other_args="$*"   # uncommon, but needed when called from an RPM upgrade action
+other_args="--ndb-connectstring=XXX $*"   # uncommon, but needed when called from an RPM upgrade action
            # Expected: "--skip-networking --skip-grant-tables"
            # They are not checked here, intentionally, as it is the resposibility
            # of the "spec" file author to give correct arguments only.

Taken from
http://sqlinfo.ru/forum/viewtopic.php?pid=30030#p30030

Applying changes in the /etc/init.d/mysqld doesn't cover the situation. :(

Thank you for the feedback.

I can not repeat described behavior: start script successfully reads ndb-related options in my environment. Have you tried to start mysqld without this script? Did it solve the problem? Please also send us full mysqld error log file.

mysqld start

Attachment: AN01.mysqldomain.err (text/plain), 1.03 KiB.

mysqld start log

Attachment: AN01.mysqldomain.err (text/plain), 1.03 KiB.

MgmtSrvr log

Attachment: ndb_1_cluster.log (application/octet-stream, text), 30.09 KiB.

Thank you for the feedback.

Error log contains "Tables not available after 30 seconds.  Consider increasing --ndb-wait-setup value" This means mysqld is connecting to Cluster, so this is not startup script or option handling issue. Try increasing --ndb-wait-setup value and check network between mysqld and Cluster.

Svetlana, thank you for your advice, but this is not helpful in current situation.
Situation is very simple:
- calling "/var/init.d/mysql start" gives us nothing, except open connection
- calling "mysqld_safe --ndb-connectstring=192.168.92.11&" gives us immediately connection and registration
So, why do you decide, that problem is in timeout of 30 seconds?

There can be some problems with security, configuration and something else, but really not in timeouts. 
In my opinion real mistake happens at SQLnode side, during initialization inside scripts. So, it'll be great to look at - which parameters are sending to the MGMnode in both cases. Do you know any possibility to debug it?

Situation exactly repeated with MySQLcluster 7.2.4. :(

I'm also facing with the same problem. I tried two versions to get same result.
mysql-cluster-gpl-7.0.29-linux-x86_64-glibc23.tar.gz
mysql-cluster-gpl-7.2.4-linux2.6-x86_64.tar.gz

I checked ndb_mgmd's log with verbose mode.

2012-02-29 23:05:44 [MgmtSrvr] DEBUG    -- 192.168.39.1:49375: Connected!
2012-02-29 23:05:44 [MgmtSrvr] DEBUG    -- Trying to allocate nodeid for 192.168.39.1(nodeid: 0, type: API)
2012-02-29 23:05:44 [MgmtSrvr] DEBUG    --  [0]: 31, '192.168.39.1', 1
2012-02-29 23:05:44 [MgmtSrvr] DEBUG    --  [1]: 32, '192.168.39.1', 1
2012-02-29 23:05:44 [MgmtSrvr] DEBUG    --  [2]: 33, '192.168.39.1', 1
2012-02-29 23:05:44 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31...
2012-02-29 23:05:45 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31... - Repeated 8 times
2012-02-29 23:05:45 [MgmtSrvr] INFO     -- Node 26: Initial start, waiting for 25 to connect,  nodes [ all: 25 and 26 connected: 26 no-wait:  ]
2012-02-29 23:05:45 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31...
2012-02-29 23:05:45 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31... - Repeated 3 times
2012-02-29 23:05:46 [MgmtSrvr] INFO     -- Node 25: Initial start, waiting for 26 to connect,  nodes [ all: 25 and 26 connected: 25 no-wait:  ]
2012-02-29 23:05:46 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31...
2012-02-29 23:05:47 [MgmtSrvr] DEBUG    -- Retrying allocation of nodeid 31... - Repeated 17 times
2012-02-29 23:05:47 [MgmtSrvr] DEBUG    -- Failed to allocate nodeid 31 for API node in cluster (retried during 3100 milliseconds)
2012-02-29 23:05:47 [MgmtSrvr] DEBUG    -- Releasing local reservation for nodeid 31
2012-02-29 23:05:47 [MgmtSrvr] WARNING  -- Failed to allocate nodeid for API at 192.168.39.1. Returned eror: 'No free node id found for mysqld(API).'

There are numbers [0] to [2], which are slots, 
as I put three [api] entries in config.ini as a try.

Node 25 and 26 are Data nodes, so we may ignore these lines.

I searched the source code for these debug messages.
This log shows that MgmtSrvr::try_alloc() called alloc_node_id_req() 
about 30 times with 100ms of interval, and finally timeout occurred. 
So next step is why alloc_node_id_req() continued to fail.
But alloc_node_id_req() is too complicated for me to understand. 
Developers' help is needed.

Additional info about Yoshiaki Tajika's environment.

192.168.39.1  MGM and SQL1
192.168.39.5  DATA1
192.168.39.6  DATA2

# cat config.ini
[ndb_mgmd]
NodeId: 1
hostname: 192.168.39.1
datadir: /usr/local/mysql/data

[ndbd default]
NoOfReplicas: 2
DataDir: /usr/local/mysql/data
DataMemory: 300M
IndexMemory: 100M

[ndbd]
NodeId: 25
hostname: 192.168.39.5

[ndbd]
NodeId: 26
hostname: 192.168.39.6

[api]
NodeId: 31
hostname: 192.168.39.1

[api]
NodeId: 32
hostname: 192.168.39.1

[api]
NodeId: 33
hostname: 192.168.39.1

# cat /etc/my.cnf
[mysqld]
ndbcluster
ndb_connectstring = 192.168.39.1
datadir=/usr/local/mysql/data

I figured out my problem. Data nodes have not yet started completely.

ndb_mgm> all status
Node 25: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4)
Node 26: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4)
Node 27: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4)
Node 28: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4)

I checked DATA node's log, and found that phase 0 completed,
but phase 1 had not started. 

# cat ndb_25_out.log
 : (snipped)
2012-03-01 12:59:52 [ndbd] INFO     -- Start phase 0 completed

Data nodes could certainly communicate with MGM, and MGM recognized them. 
But Data nodes could not communicate with each other. The cause was firewall.
When I temporarily disabled every DATA node's firewall,
everything started to work.

ndb_mgm> all status
Node 25: started (mysql-5.5.19 ndb-7.2.4)
Node 26: started (mysql-5.5.19 ndb-7.2.4)
Node 27: started (mysql-5.5.19 ndb-7.2.4)
Node 28: started (mysql-5.5.19 ndb-7.2.4)

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=25   @192.168.39.5  (mysql-5.5.19 ndb-7.2.4, Nodegroup: 0, Master)
id=26   @192.168.39.6  (mysql-5.5.19 ndb-7.2.4, Nodegroup: 0)
id=27   @192.168.39.7  (mysql-5.5.19 ndb-7.2.4, Nodegroup: 1)
id=28   @192.168.39.8  (mysql-5.5.19 ndb-7.2.4, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.39.1  (mysql-5.5.19 ndb-7.2.4)

[mysqld(API)]   3 node(s)
id=31   @192.168.39.1  (mysql-5.5.19 ndb-7.2.4)
id=32 (not connected, accepting connect from 192.168.39.1)
id=33 (not connected, accepting connect from 192.168.39.1)

Michael Prokopiv, how is your case?

Hello!
At my enviroment firewall accepts all connections inside cluster net.
And in my situation, mysqld starting as service can't register at MGM server (connection established), but the same mysqld starting manually (mysqld_safe) connecting succesfully.
Michael

On 7.2.7 :  storage/ndb/src/mgmsrv/MgmtSrvr.cpp : 3752 
                          "Returned eror: '%s'",                                                                                                                                       

Seems incorrect  (one R missing at error ...)
Not sure if I need to open a new bug for this

BR,
Joffrey

Joffrey,

yes, please open separate bug report. This typo is not related to current one.

7.2.25, 7.4.12 - can't reproduce problem, mysqld starts properly