MySQL Bugs: #42543: ndb_mgmd can't handle configuration larger then 32k

Bug #42543	ndb_mgmd can't handle configuration larger then 32k
Submitted:	2 Feb 2009 13:49	Modified:	26 Feb 2009 14:04
Reporter:	Johan Andersson	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	5.1-telco-6.4 -> 6.4.1	OS:	Any
Assigned to:	Magnus Blåudd	CPU Architecture:	Any
Tags:	6.4, ndb_mgmd

Description:
I have two data nodes and two management servers configured.
All on separate computers.

On ps-ndb01 I can start the management server:
[root@ps-ndb01 scripts]# ps -ef |grep ndb_mgmd
root     29633     1  3 14:43 ?        00:00:00 /usr/local/mysql//mysql/bin//ndb_mgmd -c ps-ndb01 -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial

On ps-ndb02 I start the second:
[root@ps-ndb02 ~]#  /usr/local/mysql//mysql/bin//ndb_mgmd -c "ps-ndb02;ps-ndb01" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial  --nodaemon
2009-02-02 14:45:51 [MgmSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.31 ndb-6.4.1-beta
2009-02-02 14:45:51 [MgmSrvr] INFO     -- Reading cluster configuration from '/etc/mysql/config.ini'
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Got initial configuration from '/etc/mysql/config.ini', will try to set it when all ndb_mgmd(s) started
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Mgmt server state: nodeid 2 reserved for ip 10.128.22.128, m_reserved_nodes 0000000000000000000000000000000000000000000000000000000000000004.
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 2: Node 2 Connected
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Id: 2, Command port: *:1186
==INITIAL==
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 2: Node 1 Connected
2009-02-02 14:45:52 [MgmSrvr] INFO     -- Node 1 connected
2009-02-02 14:45:52 [MgmSrvr] ERROR    -- Got CONFIG_CHECK_REF from node 1, error: 1, message: 'Wrong state'
generation: 0, expected generation: 0
state: 0, expected state: 0
[root@ps-ndb02 ~]# 

(i started with --nodaemon just to get the CONFIG_CHECK_REF printed on the screen).

How to repeat:
two ndb_mgmd
x data nodes

host A:
ndb_mgmd -c "A;B" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial  

On B:
ndb_mgmd -c "A;B" -f /etc/mysql/config.ini --configdir=/etc/mysql/ --reload --initial

error: 1 in CONFIG_CHECK_REF means "WrongState", ie. the other ndb_mgmd is in wrong state at the moment.

But why?

There is no reason as the config.ini is the same on both and it is an initial start.

BR
johan

And "state: 0"  is the expected state :)

generation: 0, expected generation: 0
state: 0, expected state: 0

Occurs when the binary config that we send via transporter are larger then what can fit into one signal. The config need to be sent as a fragmented signal between nodes.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67146

2870 Magnus Svensson	2009-02-23
      Bug#42543 ndb_mgmd can't handle configuration larger then 32k
       - Patch 1, error handling.
       - Don't continue configuration change protocol unless
         signal for "prepare" has been sent to all ndb_mgmd's
       - Print error message and exit if ndb_mgmd is starting up.
       - Send ConfigChangeRef to requestor if ndb_mgmd is running
      modified:
        storage/ndb/include/kernel/signaldata/ConfigChange.hpp
        storage/ndb/src/mgmsrv/ConfigManager.cpp
        storage/ndb/src/mgmsrv/ConfigManager.hpp

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/67149

2871 Magnus Svensson	2009-02-23
      Bug#42543 ndb_mgmd can't handle configuration larger then 32k
       - Patch 2, send fragmented.
       - To overcome the 32k limit on configuration data, the signal
         need to be sent fragmented. i.e it's split up into chunks
         smaller than 32k and then reassembled on the other side.
       - Change both ConfigManager->ConfigManger and MgmtSrvr->ConfigManager
         to use fragmented signals
      added:
        storage/ndb/src/mgmsrv/Defragger.hpp
      modified:
        storage/ndb/src/mgmsrv/ConfigManager.cpp
        storage/ndb/src/mgmsrv/ConfigManager.hpp
        storage/ndb/src/mgmsrv/MgmtSrvr.cpp
        storage/ndb/src/ndbapi/SignalSender.cpp
        storage/ndb/src/ndbapi/SignalSender.hpp
        storage/ndb/src/ndbapi/TransporterFacade.hpp

Approved on IRC

Pushed into 5.1.32-ndb-6.4.3 (revid:msvensson@mysql.com-20090223120724-6jb5hzpz29bp0ru2) (version source revid:msvensson@mysql.com-20090223120724-6jb5hzpz29bp0ru2) (merge vers: 5.1.32-ndb-6.4.3) (pib:6)

Documented in the NDB-6.4.3 changelog as follows:

        If the cluster configuration cache file was larger than 32K, the
        management server would not start.