Bug #64158 | No free node id found for mysqld(API). | ||
---|---|---|---|
Submitted: | 29 Jan 2012 19:28 | Modified: | 30 Aug 2016 23:08 |
Reporter: | Michael Prokopiv | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | MySQL-Cluster-gpl-management-7.1.18-1.el | OS: | Linux (Oracle Linux Release 6 Update 2 Boot iso image for x86_64 (64 bit)) |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | already allocated, free node id |
[29 Jan 2012 19:28]
Michael Prokopiv
[30 Jan 2012 18:03]
Jonas Oreland
you need to start your data-nodes. the node id allocation is performed there... /Jonas
[31 Jan 2012 8:40]
Michael Prokopiv
The real problem is in the difference between starting server from command line ("mysqld_safe&") and as system service ("/etc/init.d/mysql start"): http://forums.mysql.com/read.php?25,295364,295404#msg-295404
[31 Jan 2012 8:40]
Michael Prokopiv
SQL node init file
Attachment: mysql (text/plain), 11.95 KiB.
[1 Feb 2012 20:48]
Sveta Smirnova
Thank you for the feedback. Where is located that cnf file which does not read mysqld server started using mysql script?
[2 Feb 2012 18:00]
Michael Prokopiv
/etc/my.cnf
[2 Feb 2012 18:04]
Michael Prokopiv
Compare between /etc/init.d/mysqld files: --- mysql.orig 2012-02-02 01:22:15.405952008 +0400 +++ mysql 2012-02-02 01:24:34.856156547 +0400 @@ -106,7 +106,7 @@ mode=$1 # start or stop shift -other_args="$*" # uncommon, but needed when called from an RPM upgrade action +other_args="--ndb-connectstring=XXX $*" # uncommon, but needed when called from an RPM upgrade action # Expected: "--skip-networking --skip-grant-tables" # They are not checked here, intentionally, as it is the resposibility # of the "spec" file author to give correct arguments only. Taken from http://sqlinfo.ru/forum/viewtopic.php?pid=30030#p30030
[3 Feb 2012 14:51]
Michael Prokopiv
Applying changes in the /etc/init.d/mysqld doesn't cover the situation. :(
[14 Feb 2012 16:50]
Sveta Smirnova
Thank you for the feedback. I can not repeat described behavior: start script successfully reads ndb-related options in my environment. Have you tried to start mysqld without this script? Did it solve the problem? Please also send us full mysqld error log file.
[20 Feb 2012 8:09]
Michael Prokopiv
mysqld start
Attachment: AN01.mysqldomain.err (text/plain), 1.03 KiB.
[20 Feb 2012 8:09]
Michael Prokopiv
mysqld start log
Attachment: AN01.mysqldomain.err (text/plain), 1.03 KiB.
[20 Feb 2012 8:39]
Michael Prokopiv
MgmtSrvr log
Attachment: ndb_1_cluster.log (application/octet-stream, text), 30.09 KiB.
[20 Feb 2012 17:56]
Sveta Smirnova
Thank you for the feedback. Error log contains "Tables not available after 30 seconds. Consider increasing --ndb-wait-setup value" This means mysqld is connecting to Cluster, so this is not startup script or option handling issue. Try increasing --ndb-wait-setup value and check network between mysqld and Cluster.
[21 Feb 2012 7:21]
Michael Prokopiv
Svetlana, thank you for your advice, but this is not helpful in current situation. Situation is very simple: - calling "/var/init.d/mysql start" gives us nothing, except open connection - calling "mysqld_safe --ndb-connectstring=192.168.92.11&" gives us immediately connection and registration So, why do you decide, that problem is in timeout of 30 seconds? There can be some problems with security, configuration and something else, but really not in timeouts. In my opinion real mistake happens at SQLnode side, during initialization inside scripts. So, it'll be great to look at - which parameters are sending to the MGMnode in both cases. Do you know any possibility to debug it?
[22 Feb 2012 8:13]
Michael Prokopiv
Situation exactly repeated with MySQLcluster 7.2.4. :(
[29 Feb 2012 14:13]
Yoshiaki Tajika
I'm also facing with the same problem. I tried two versions to get same result. mysql-cluster-gpl-7.0.29-linux-x86_64-glibc23.tar.gz mysql-cluster-gpl-7.2.4-linux2.6-x86_64.tar.gz I checked ndb_mgmd's log with verbose mode. 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- 192.168.39.1:49375: Connected! 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- Trying to allocate nodeid for 192.168.39.1(nodeid: 0, type: API) 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- [0]: 31, '192.168.39.1', 1 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- [1]: 32, '192.168.39.1', 1 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- [2]: 33, '192.168.39.1', 1 2012-02-29 23:05:44 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... 2012-02-29 23:05:45 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... - Repeated 8 times 2012-02-29 23:05:45 [MgmtSrvr] INFO -- Node 26: Initial start, waiting for 25 to connect, nodes [ all: 25 and 26 connected: 26 no-wait: ] 2012-02-29 23:05:45 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... 2012-02-29 23:05:45 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... - Repeated 3 times 2012-02-29 23:05:46 [MgmtSrvr] INFO -- Node 25: Initial start, waiting for 26 to connect, nodes [ all: 25 and 26 connected: 25 no-wait: ] 2012-02-29 23:05:46 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... 2012-02-29 23:05:47 [MgmtSrvr] DEBUG -- Retrying allocation of nodeid 31... - Repeated 17 times 2012-02-29 23:05:47 [MgmtSrvr] DEBUG -- Failed to allocate nodeid 31 for API node in cluster (retried during 3100 milliseconds) 2012-02-29 23:05:47 [MgmtSrvr] DEBUG -- Releasing local reservation for nodeid 31 2012-02-29 23:05:47 [MgmtSrvr] WARNING -- Failed to allocate nodeid for API at 192.168.39.1. Returned eror: 'No free node id found for mysqld(API).' There are numbers [0] to [2], which are slots, as I put three [api] entries in config.ini as a try. Node 25 and 26 are Data nodes, so we may ignore these lines. I searched the source code for these debug messages. This log shows that MgmtSrvr::try_alloc() called alloc_node_id_req() about 30 times with 100ms of interval, and finally timeout occurred. So next step is why alloc_node_id_req() continued to fail. But alloc_node_id_req() is too complicated for me to understand. Developers' help is needed.
[1 Mar 2012 0:35]
Yoshiaki Tajika
Additional info about Yoshiaki Tajika's environment. 192.168.39.1 MGM and SQL1 192.168.39.5 DATA1 192.168.39.6 DATA2 # cat config.ini [ndb_mgmd] NodeId: 1 hostname: 192.168.39.1 datadir: /usr/local/mysql/data [ndbd default] NoOfReplicas: 2 DataDir: /usr/local/mysql/data DataMemory: 300M IndexMemory: 100M [ndbd] NodeId: 25 hostname: 192.168.39.5 [ndbd] NodeId: 26 hostname: 192.168.39.6 [api] NodeId: 31 hostname: 192.168.39.1 [api] NodeId: 32 hostname: 192.168.39.1 [api] NodeId: 33 hostname: 192.168.39.1 # cat /etc/my.cnf [mysqld] ndbcluster ndb_connectstring = 192.168.39.1 datadir=/usr/local/mysql/data
[1 Mar 2012 5:02]
Yoshiaki Tajika
I figured out my problem. Data nodes have not yet started completely. ndb_mgm> all status Node 25: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4) Node 26: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4) Node 27: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4) Node 28: starting (Last completed phase 0) (mysql-5.5.19 ndb-7.2.4) I checked DATA node's log, and found that phase 0 completed, but phase 1 had not started. # cat ndb_25_out.log : (snipped) 2012-03-01 12:59:52 [ndbd] INFO -- Start phase 0 completed Data nodes could certainly communicate with MGM, and MGM recognized them. But Data nodes could not communicate with each other. The cause was firewall. When I temporarily disabled every DATA node's firewall, everything started to work. ndb_mgm> all status Node 25: started (mysql-5.5.19 ndb-7.2.4) Node 26: started (mysql-5.5.19 ndb-7.2.4) Node 27: started (mysql-5.5.19 ndb-7.2.4) Node 28: started (mysql-5.5.19 ndb-7.2.4) ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 4 node(s) id=25 @192.168.39.5 (mysql-5.5.19 ndb-7.2.4, Nodegroup: 0, Master) id=26 @192.168.39.6 (mysql-5.5.19 ndb-7.2.4, Nodegroup: 0) id=27 @192.168.39.7 (mysql-5.5.19 ndb-7.2.4, Nodegroup: 1) id=28 @192.168.39.8 (mysql-5.5.19 ndb-7.2.4, Nodegroup: 1) [ndb_mgmd(MGM)] 1 node(s) id=1 @192.168.39.1 (mysql-5.5.19 ndb-7.2.4) [mysqld(API)] 3 node(s) id=31 @192.168.39.1 (mysql-5.5.19 ndb-7.2.4) id=32 (not connected, accepting connect from 192.168.39.1) id=33 (not connected, accepting connect from 192.168.39.1) Michael Prokopiv, how is your case?
[1 Mar 2012 10:54]
Michael Prokopiv
Hello! At my enviroment firewall accepts all connections inside cluster net. And in my situation, mysqld starting as service can't register at MGM server (connection established), but the same mysqld starting manually (mysqld_safe) connecting succesfully. Michael
[22 Aug 2012 9:11]
Joffrey MICHAIE
On 7.2.7 : storage/ndb/src/mgmsrv/MgmtSrvr.cpp : 3752 "Returned eror: '%s'", Seems incorrect (one R missing at error ...) Not sure if I need to open a new bug for this BR, Joffrey
[23 Aug 2012 17:49]
Sveta Smirnova
Joffrey, yes, please open separate bug report. This typo is not related to current one.
[30 Aug 2016 23:08]
MySQL Verification Team
7.2.25, 7.4.12 - can't reproduce problem, mysqld starts properly