Bug #49400 duplicate [TCP] section makes ndb_mgmd abort silently
Submitted: 3 Dec 2009 14:09 Modified: 7 Jun 2010 16:14
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Linux
Assigned to: John David Duncan CPU Architecture:Any
Tags: mysql-cluster-7.09b

[3 Dec 2009 14:09] Hartmut Holzgraefe
Description:
ndb_mgmd fails with an abort signal if two [TCP] entries with the same value combination for HostName1/HostName2 exist.

How to repeat:
Start ndb_mgmd with the following minimal configuration:

  [ndbd default]
  NoOfReplicas=1
  DataDir=...

  [ndb_mgmd]
  Id=1
  Hostname=localhost
  DataDir=...

  [ndbd]
  Id=2
  Hostname=localhost

  [mysqld]

  [tcp]
  NodeId1=1
  NodeId2=2 
  Hostname1=localhost
  Hostname2=localhost

  # again
  [tcp]
  NodeId1=1
  NodeId2=2
  Hostname1=localhost
  Hostname2=localhost

$ ndb_mgmd -f config.ini --nodaemon --configdir=/data2/csc/42253/config
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.39 ndb-7.0.9b
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Reading cluster configuration from 'config.ini'
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Got initial configuration from 'config.ini', will try to set it when all ndb_mgmd(s) started
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Mgmt server state: nodeid 1 reserved for ip 127.0.0.1, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000002.
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Node 1: Node 1 Connected
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Id: 1, Command port: *:1186
==INITIAL==
  2009-12-03 14:49:27 [MgmSrvr] INFO     -- Starting initial configuration change
  Aborted (core dumped)

core backtrace is:

  #0  0x000000324322e37d in raise () from /lib64/tls/libc.so.6
  #1  0x000000324322faae in abort () from /lib64/tls/libc.so.6
  #2  0x000000000049f003 in require (b=false) at Config.cpp:30
  #3  0x00000000004a030e in diff_connections (a=0x90c580, b=0x40ac5f60, diff=@0x40ac5de0) at Config.cpp:506
  #4  0x00000000004a0795 in Config::diff (this=0x90c580, other=0x40ac5f60, diff=@0x40ac5de0, exclude=0x0) at Config.cpp:595
  #5  0x00000000004a0e83 in Config::illegal_change (this=0x90c580, other=0x40ac5f60) at Config.cpp:770
  #6  0x00000000004a3d4d in ConfigManager::execCONFIG_CHANGE_IMPL_REQ (this=0x8bc430, ss=@0x40ac6050, sig=0x8c01e0) at ConfigManager.cpp:752
  #7  0x00000000004a6abe in ConfigManager::run (this=0x8bc430) at ConfigManager.cpp:1744
  #8  0x0000000000485138 in MgmtThread::run_C (t=0x8bc430) at MgmtThread.hpp:31
  #9  0x000000000053d41c in ndb_thread_wrapper (_ss=0x905af0) at NdbThread.c:147
  #10 0x0000003243b060aa in start_thread () from /lib64/tls/libpthread.so.0
  #11 0x00000032432c5b43 in clone () from /lib64/tls/libc.so.6
  #12 0x0000000000000000 in ?? ()

Suggested fix:
Detect duplicate entries and terminate with an understandable error message.
[3 Dec 2009 14:36] Hartmut Holzgraefe
Having two [TCP] entries with the same node id pair but with NodeId1/2 swapped is accepted though, even when the Hostname1/2 settings are completely different. The following config.ini is accepted by the starting management node:

[ndb_mgmd]
Id=1
Hostname=ndbsup-1
DataDir=/data2/csc/42253/cluster

[ndbd default]
NoOfReplicas=2
DataDir=/data2/csc/42253/cluster

[ndbd] 
Id=2
Hostname=ndbsup-2

[ndbd] 
Id=3
Hostname=ndbsup-3

[mysqld]

[tcp]
NodeId1=2
NodeId2=3
Hostname1=ndbsup-priv-1
Hostname2=ndbsup-priv-2

# again, but with swapped Ids and different HostNames:
[tcp]
NodeId1=3
NodeId2=2
Hostname1=ndbsup-priv-3
Hostname2=ndbsup-priv-4
[14 Dec 2009 21:05] Magnus BlÄudd
1. The config.ini parser should detect the duplicate TCP section and throw an error. "At line X, you have already defined a TCP section for this node pair"

2. The diff algorithm should probably be fixed to cope with the error and not crash mgmd.
[12 May 2010 19:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/108195

3533 John David Duncan	2010-05-12
      bug#49400: when parsing a config file, reject any TCP,SHM, or SCI connection that is a duplicate of a previously defined connection.
[2 Jun 2010 16:51] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/110017

3534 John David Duncan	2010-06-02
      bug#49400: minor fixes.  This is a patch on top of the previous one.
[3 Jun 2010 15:14] Bugs System
Pushed into 5.1.44-ndb-7.0.16 (revid:jdd@mysql.com-20100603151238-2lxa7snekhvlo4sx) (version source revid:jdd@mysql.com-20100603151238-2lxa7snekhvlo4sx) (merge vers: 5.1.44-ndb-7.0.16) (pib:16)
[7 Jun 2010 15:40] Jon Stephens
Also pushed to 5.1.44-ndb-7.1.5 (verified by inspecting source).
[7 Jun 2010 16:14] Jon Stephens
Documented bugfix in the NDB-7.0.15 and 7.1.6 changelogs as follows:

        The presence of duplicate [tcp] sections in the config.ini file
        caused the management server to crash. Now in such cases,
        ndb_mgmd fails gracefully with an appropriate error message.

Closed.
[8 Jun 2010 12:47] Jon Stephens
Typo in previous comment, should have said 7.0.16, not 7.0.15.