Bug #42543 ndb_mgmd can't handle configuration larger then 32k
Submitted: 2 Feb 2009 13:49 Modified: 26 Feb 2009 14:04
Reporter: Johan Andersson Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.1-telco-6.4 -> 6.4.1 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any
Tags: 6.4, ndb_mgmd

[2 Feb 2009 13:49] Johan Andersson
Description:
I have two data nodes and two management servers configured.
All on separate computers.

On ps-ndb01 I can start the management server:
[root@ps-ndb01 scripts]# ps -ef |grep ndb_mgmd
root     29633     1  3 14:43 ?        00:00:00 /usr/local/mysql//mysql/bin//ndb_mgmd -c ps-ndb01 -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial

On ps-ndb02 I start the second:
[root@ps-ndb02 ~]#  /usr/local/mysql//mysql/bin//ndb_mgmd -c "ps-ndb02;ps-ndb01" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial  --nodaemon
2009-02-02 14:45:51 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.31 ndb-6.4.1-beta
2009-02-02 14:45:51 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Got initial configuration from '/etc/mysql/config.ini', will try to set it when all ndb_mgmd(s) started
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 10.128.22.128, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000004.
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 2: Node 2 Connected
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Id: 2, Command port: *:1186
==INITIAL==
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 2: Node 1 Connected
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 1 connected
2009-02-02 14:45:52 [MgmSrvr] ERROR    -- Got CONFIG_CHECK_REF from node 1, error: 1, message: 'Wrong state'
generation: 0, expected generation: 0
state: 0, expected state: 0
[root@ps-ndb02 ~]# 

(i started with --nodaemon just to get the CONFIG_CHECK_REF printed on the screen).

How to repeat:
two ndb_mgmd
x data nodes

host A:
ndb_mgmd -c "A;B" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial  

On B:
ndb_mgmd -c "A;B" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial
[2 Feb 2009 14:06] Magnus Blåudd
error: 1 in CONFIG_CHECK_REF means "WrongState", ie. the other ndb_mgmd is in wrong state at the moment.
[2 Feb 2009 14:18] Johan Andersson
But why?

There is no reason as the config.ini is the same on both and it is an initial start.

BR
johan
[2 Feb 2009 14:18] Johan Andersson
And "state: 0"  is the expected state :)

generation: 0, expected generation: 0
state: 0, expected state: 0
[19 Feb 2009 9:13] Magnus Blåudd
Occurs when the binary config that we send via transporter are larger then what can fit into one signal. The config need to be sent as a fragmented signal between nodes.
[23 Feb 2009 9:37] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67146

2870 Magnus Svensson	2009-02-23
      Bug#42543 ndb_mgmd can't handle configuration larger then 32k
       - Patch 1, error handling.
       - Don't continue configuration change protocol unless
         signal for "prepare" has been sent to all ndb_mgmd's
       - Print error message and exit if ndb_mgmd is starting up.
       - Send ConfigChangeRef to requestor if ndb_mgmd is running
      modified:
        storage/ndb/include/kernel/signaldata/ConfigChange.hpp
        storage/ndb/src/mgmsrv/ConfigManager.cpp
        storage/ndb/src/mgmsrv/ConfigManager.hpp
[23 Feb 2009 9:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67149

2871 Magnus Svensson	2009-02-23
      Bug#42543 ndb_mgmd can't handle configuration larger then 32k
       - Patch 2, send fragmented.
       - To overcome the 32k limit on configuration data, the signal
         need to be sent fragmented. i.e it's split up into chunks
         smaller than 32k and then reassembled on the other side.
       - Change both ConfigManager->ConfigManger and MgmtSrvr->ConfigManager
         to use fragmented signals
      added:
        storage/ndb/src/mgmsrv/Defragger.hpp
      modified:
        storage/ndb/src/mgmsrv/ConfigManager.cpp
        storage/ndb/src/mgmsrv/ConfigManager.hpp
        storage/ndb/src/mgmsrv/MgmtSrvr.cpp
        storage/ndb/src/ndbapi/SignalSender.cpp
        storage/ndb/src/ndbapi/SignalSender.hpp
        storage/ndb/src/ndbapi/TransporterFacade.hpp
[23 Feb 2009 10:27] Magnus Blåudd
Approved on IRC
[23 Feb 2009 13:16] Bugs System
Pushed into 5.1.32-ndb-6.4.3 (revid:msvensson@mysql.com-20090223120724-6jb5hzpz29bp0ru2) (version source revid:msvensson@mysql.com-20090223120724-6jb5hzpz29bp0ru2) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)
[26 Feb 2009 14:04] Jon Stephens
Documented in the NDB-6.4.3 changelog as follows:

        If the cluster configuration cache file was larger than 32K, the
        management server would not start.