Bug #40976 The master data node lost connection to cluster after creating the new nodegroup
Submitted: 24 Nov 2008 13:04 Modified: 1 Dec 2008 21:30
Reporter: Wen Xiong Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.4 OS:Solaris
Assigned to: CPU Architecture:Any

[24 Nov 2008 13:04] Wen Xiong
Description:
The master data node is not connected to the cluster any more after creating the new nodegroup with the added data nodes.

A cluster with one ndb_mgmd and two data nodes are started first. Then edit config.ini file to add two more data nodes and restart the cluster.

Connected to Management Server at: nanna14:16000
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2    @129.159.118.185  (mysql-5.1.29 ndb-6.4.0, Nodegroup: 0, Master)
id=3    @129.159.118.186  (mysql-5.1.29 ndb-6.4.0, Nodegroup: 0)
id=4    @129.159.118.185  (mysql-5.1.29 ndb-6.4.0, no nodegroup)
id=5    @129.159.118.186  (mysql-5.1.29 ndb-6.4.0, no nodegroup)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @129.159.118.184  (mysql-5.1.29 ndb-6.4.0)

[mysqld(API)]   1 node(s)
id=6 (not connected, accepting connect from nanna14)

After I tried to create new nodegroup with command:
ndb_mgm --ndb-connectstring="nodeid=1;host=nanna14:16000" -e "create nodegroup 4,5"

I got the error message:
Connected to Management Server at: nanna14:16000
*  1006: Illegal reply from server
*        error: -1

But actually, the nodegroup has been created but the master data node lost its connection.

Connected to Management Server at: nanna14:16000
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=2 (not connected, accepting connect from nanna15)
id=3    @129.159.118.186  (mysql-5.1.29 ndb-6.4.0, Nodegroup: 0, Master)
id=4    @129.159.118.185  (mysql-5.1.29 ndb-6.4.0, Nodegroup: 1)
id=5    @129.159.118.186  (mysql-5.1.29 ndb-6.4.0, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @129.159.118.184  (mysql-5.1.29 ndb-6.4.0)

[mysqld(API)]   1 node(s)
id=6 (not connected, accepting connect from nanna14)

After I have checked the log files, it says:
(ndb_1_cluster.log)
2008-11-24 12:22:36 [MgmSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary
error, restart node'.

(ndb_2_out.log)
2008-11-24 12:22:36 [ndbd] INFO     -- dbdih/DbdihMain.cpp
2008-11-24 12:22:36 [ndbd] INFO     -- DBDIH (Line: 6649) 0x0000000a
2008-11-24 12:22:36 [ndbd] INFO     -- Error handler shutting down system
2008-11-24 12:22:36 [ndbd] INFO     -- Error handler shutdown completed - exiting
2008-11-24 12:22:36 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

How to repeat:
This is the config file config.ini with two data nodes.

# Default settings for all data nodes
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 1G
IndexMemory= 500M
BackupMemory= 200M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages = 1000
#SendBufferMemory = 2M
NoOfFragmentLogFiles = 32
FragmentLogFileSize = 64M
TimeBetweenLocalCheckpoints=20
CompressedLCP = 1
CompressedBackup = 1
ODirect =1

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna14
PortNumber= 16000
DataDir= /export/home/tmp/wx228566/ndb_mgmd.1/

# Data Nodes
[NDBD]
Id= 2
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 3
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/

[mysqld]
HostName=nanna14

To edit config.ini, just add two ndbd nodes as :
[NDBD]
Id= 4
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 5
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/

Then restart cluster with the edited config.ini file.
[26 Nov 2008 8:39] Jonas Oreland
Hi,

i retested with pull of today, and did not get any crash.
could you check this again?

/Jonas
[26 Nov 2008 15:22] Wen Xiong
I have pulled the new changes once again and it is working now. Thanks!