Bug #42920 | mgm server always connects to port 1186 | ||
---|---|---|---|
Submitted: | 17 Feb 2009 9:23 | Modified: | 4 Nov 2009 12:22 |
Reporter: | Lars Torstensson | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
Version: | mysql-5.1-telco-6.3 | OS: | Linux |
Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
Tags: | mysql-5.1-telco-6.3, mysql-5.1.31-ndb-6.3.22 |
[17 Feb 2009 9:23]
Lars Torstensson
[17 Feb 2009 9:34]
Gustaf Thorslund
Can you please provide config.ini for both your clusters and the command line options you use when starting the ndb_mgmd?
[17 Feb 2009 9:45]
Lars Torstensson
$ ndb_mgmd -f ./config1.ini --ndb-nodeid=3 $ ndb_mgmd -f ./config2.ini --ndb-nodeid=3 Error : Could not alloc node id at localhost port 1186: Id 3 already allocated by another node.
[17 Feb 2009 15:29]
Gustaf Thorslund
Minimal configuration for testing: -bash-3.00$ cat config1.ini [NDBD DEFAULT] NoOfReplicas: 2 [NDBD] Id: 1 hostname=localhost [NDBD] Id: 2 hostname=localhost [NDB_MGMD] Id: 3 hostname: localhost PortNumber: 1186 [NDB_MGMD] Id: 4 hostname: localhost PortNumber: 10186 [MYSQLD] Id: 7 hostname: localhost -bash-3.00$ cat config2.ini [NDBD DEFAULT] NoOfReplicas: 2 [NDBD] Id: 1 hostname=localhost [NDBD] Id: 2 hostname=localhost [NDB_MGMD] Id: 3 hostname: localhost PortNumber: 12000 [NDB_MGMD] Id: 4 hostname: localhost PortNumber: 13000 [MYSQLD] Id: 7 hostname: localhost
[24 Mar 2009 14:41]
Gustaf Thorslund
Testing the above config on 6.3 from bzr: hillbilly% (cd 1; ndb_mgmd -f ../config1.ini --ndb-nodeid=3) hillbilly% ndb_mgm -c localhost:1186 -e show Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=1 (not connected, accepting connect from localhost) id=2 (not connected, accepting connect from localhost) [ndb_mgmd(MGM)] 2 node(s) id=3 @localhost (mysql-5.1.32 ndb-6.3.24) id=4 (not connected, accepting connect from localhost) [mysqld(API)] 1 node(s) id=7 (not connected, accepting connect from localhost) hillbilly% (cd 2; ndb_mgmd -f ../config2.ini --ndb-nodeid=3) Error : Could not alloc node id at localhost port 1186: Id 3 already allocated by another node. So bug verified as described. Lets do some more testing. hillbilly% (cd 2; ndb_mgmd -f ../config2.ini --ndb-nodeid=4) hillbilly% ndb_mgm -c localhost:13000 -e show Unable to connect with connect string: nodeid=0,localhost:13000 Retrying every 5 seconds. Attempts left: 2 1, failed. So no mgmd on port 13000... hillbilly% ndb_mgm -c localhost:1186 -e show Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=1 (not connected, accepting connect from localhost) id=2 (not connected, accepting connect from localhost) [ndb_mgmd(MGM)] 2 node(s) id=3 @localhost (mysql-5.1.32 ndb-6.3.24) id=4 (not connected, accepting connect from localhost) [mysqld(API)] 1 node(s) id=7 (not connected, accepting connect from localhost) And it haven't ended up on on cluster 1 either. hillbilly% cat 2/ndb_4_out.log NDB Cluster Management Server. mysql-5.1.32 ndb-6.3.24-GA Id: 4, Command port: *:10186 But it's got the port number that's supposed to be used for mgmd with id 4 on cluster 1. hillbilly% ndb_mgm -c localhost:10186 -e show Connected to Management Server at: localhost:10186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=1 (not connected, accepting connect from localhost) id=2 (not connected, accepting connect from localhost) [ndb_mgmd(MGM)] 2 node(s) id=3 (not connected, accepting connect from localhost) id=4 @localhost (mysql-5.1.32 ndb-6.3.24) [mysqld(API)] 1 node(s) id=7 (not connected, accepting connect from localhost) But it's being part of cluster 2. This is confusing... hillbilly% bzr version-info revision-id: jonas@mysql.com-20090316151554-pm72q8jn4awmwpse date: 2009-03-16 16:15:54 +0100 build-date: 2009-03-24 15:37:27 +0100 revno: 2913 branch-nick: cluster-6.3
[24 Mar 2009 15:09]
Gustaf Thorslund
More testing, with different nodeid for the two mgmd for the second cluster. hillbilly% (cd 1; ndb_mgmd -f ../config1.ini --ndb-nodeid=3) hillbilly% cat config3.ini [NDBD DEFAULT] NoOfReplicas: 2 [NDBD] Id: 1 hostname=localhost [NDBD] Id: 2 hostname=localhost [NDB_MGMD] Id: 5 hostname: localhost PortNumber: 12000 [NDB_MGMD] Id: 6 hostname: localhost PortNumber: 13000 [MYSQLD] Id: 7 hostname: localhost hillbilly% (cd 3; ndb_mgmd -f ../config3.ini --ndb-nodeid=5) Error : Could not alloc node id at localhost port 1186: No node defined with id=5 in config file. So it doesn't read the configuration but appears to prefer the a mgmd it can find on port 1186.
[28 Apr 2009 10:33]
jack andrews
from strorage/src/mgmapi/mgmapi.cpp: [see line marked ==>] extern "C" int ndb_mgm_alloc_nodeid(NdbMgmHandle handle, unsigned int version, int nodetype, int log_event) { [snip] const ParserRow<ParserDummy> reply[]= { MGM_CMD("get nodeid reply", NULL, ""), MGM_ARG("error_code", Int, Optional, "Error code"), MGM_ARG("nodeid", Int, Optional, "Error message"), MGM_ARG("result", String, Mandatory, "Error message"), MGM_END() }; const Properties *prop; prop= ndb_mgm_call(handle, reply, "get nodeid", &args); CHECK_REPLY(handle, prop, -1); nodeid= -1; do { const char * buf; if (!prop->get("result", &buf) || strcmp(buf, "Ok") != 0) { const char *hostname= ndb_mgm_get_connected_host(handle); ==> unsigned port= ndb_mgm_get_connected_port(handle); BaseString err; Uint32 error_code= NDB_MGM_ALLOCID_ERROR; err.assfmt("Could not alloc node id at %s port %d: %s", hostname, port, buf); prop->get("error_code", &error_code); setError(handle, error_code, __LINE__, err.c_str()); break; } Uint32 _nodeid; if(!prop->get("nodeid", &_nodeid) != 0){ fprintf(handle->errstream, "ERROR Message: <nodeid Unspecified>\n"); break; } nodeid= _nodeid; }while(0); delete prop; return nodeid; }
[28 Apr 2009 16:45]
Magnus Blåudd
The call to 'ndb_mgm_get_connect_port' simply returns the port number to which the handle is connected(if any). Doesn't connect to remote host there.
[2 Jun 2009 21:50]
Henrik Ingo
Gustaf, Lars: Could you both check whether you have a my.cnf installed in some default location? "Default options are read from the following files in the given order: /etc/my.cnf /etc/mysql/my.cnf /usr/local/mysqlcluster-63/build/install/etc/my.cnf ~/.my.cnf" For instance on my kubuntu I have: $ grep "connect-string" /etc/mysql/my.cnf /etc/mysql/my.cnf:connect-string=localhost /etc/mysql/my.cnf:connect-string=localhost Notice that a ndb_mgmd will read this file on startup, since you didn't specify --defaults-file= on the command line. See ndb_mgmd --help for details. Since all your ndb_mgmd processes will read the same my.cnf file, they will now always connect to localhost:1186 for configuration whether they belong to cluster X or cluster Y. When playing around with multiple mysql installations, it is usually a good idea to delete the standard rpm|deb package that came with the distribution, since this is a really common error I hit myself quite often. Please confirm if this is indeed the case.
[5 Jun 2009 11:00]
Henrik Ingo
Further info On the machines where the behaviour was detected, could you try $ ./my_print_defaults mysqld --debug" | | | my: path: '/usr/local/mysql/etc/my.cnf' stat_area: 0x7fffc92c6040 MyFlags: 0 | | | error: Got errno: 2 from stat ...and look for stat results of all the places where my.cnf is tried. (The file may become quite large, use Ctrl+F or similar in an editor to find it.)
[18 Jun 2009 13:23]
jack andrews
this behaviour is 'fixed' in -7.0 (but i haven't found out what code fixes it -- maybe the new bin config code?) do we need a patch for 6.3?
[18 Jun 2009 14:46]
Henrik Ingo
@jack: Did you verify that the bug exists on 6.3 and then that it doesn't exist when upgrading to 7.0? If you don't have a my.cnf installed in some locations, then the problem simply doesn't happen and this is regardless of version.
[18 Jun 2009 15:25]
jack andrews
> @jack: Did you verify that the bug exists on 6.3 and then that > it doesn't exist when upgrading to 7.0? If you don't have a > my.cnf installed in some locations, then the problem simply > doesn't happen and this is regardless of version. i built 7.0 and 6.3. i ran in 7.0 and the bug didn't occur. i ran in 6.3 and the problem occurred. i don't think i have a my.cnf file -- at least, not one that i created.
[9 Oct 2009 12:53]
Magnus Blåudd
Checked the code and unfoirtunately the ndb_mgmd in 6.3 fall back to use the builtin default for port if no NDB_CONNECTSTRING or --ndb-connectstring is used. That means the best workaround for this bug is to set NDB_CONNECTSTRING=ownhost:ownport and then start the ndb_mgmd, it should now only try to connect to "own_port". But of course, the ndb_mgmd should not need to connect to any other ndb_mgmd at all this early in the startup phase. Problem has been fixed in 7.0 by quite a large rewrite of ndbg_mgmd startup, can't really think of a small fix for 6.3 ndb_mgmd.
[4 Nov 2009 12:22]
Bernd Ocklin
Fixed in 7.0 by refactoring. Will not fix in 6.3.