MySQL Bugs: #41462: Mysqld/ndbapi disconnects too agressively during node restart

Bug #41462	Mysqld/ndbapi disconnects too agressively during node restart
Submitted:	15 Dec 2008 8:57	Modified:	11 Feb 2009 16:52
Reporter:	Wen Xiong	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	cluster 6.2	OS:	Solaris
Assigned to:		CPU Architecture:	Any

Description:
I can not insert data from the master server to tables as well as any other queries while adding nodes to master cluster during replication. As a result, no data can be replicated during this time.

To perform adding nodes to master cluster, after editing the config file with two more data nodes, we need to restart ndb_mgmd and ndbd nodes. After the restart of "old" ndbd nodes,  they know that the "new" ndbd nodes are coming(The "new" data nodes are not started at the moment). As a result, the data inserted to the table will be distributed on all the data nodes after we restart the cluster. During this time, new data can not be inserted to the table through mysql client until all the data nodes are "started" including the "new". Otherwise, any query will cause the following error messages:
*
ERROR 1033 (HY000): Incorrect information in file: './DB1/t3.frm'
*
t3 is the table I created and inserted with data. So, no data is replicated during the process although the replication is still kept alive. 

How to repeat:
1. Configuration for master cluster:
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 200M
IndexMemory= 50M
BackupMemory= 100M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages = 1000
#SendBufferMemory = 2M
NoOfFragmentLogFiles = 32
FragmentLogFileSize = 64M
TimeBetweenLocalCheckpoints=20
CompressedLCP = 1
CompressedBackup = 1
ODirect =1

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna13
PortNumber= 16000
DataDir= /export/home/tmp/wx228566/ndb_mgmd.1/

# Data Nodes
[NDBD]
Id= 2
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 3
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/

[mysqld]
HostName=nanna13

[mysqld]
HostName=nanna13

Option to start master mysqld:
[mysqld]
server-id=1
skip-innodb
ndbcluster
skip-grant-tables
log-bin=nanna13-bin
binlog-format=row
ndb-connectstring=nanna13:16000
log-error=/export/home/tmp/wx228566/ndb_mgmd.1/mysqld_err
socket= /export/home/tmp/wx228566/mysqld-soc

2. Configuration for slave cluster:
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 200M
IndexMemory= 50M
BackupMemory= 100M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages = 1000
#SendBufferMemory = 2M
NoOfFragmentLogFiles = 32
FragmentLogFileSize = 64M
TimeBetweenLocalCheckpoints=20
CompressedLCP = 1
CompressedBackup = 1
ODirect =1

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna14
PortNumber= 16000
DataDir= /export/home2/tmp/wx228566/ndb_mgmd.1/

# Data Nodes
[NDBD]
Id= 2
HostName= nanna15
DataDir= /export/home2/tmp/wx228566/ndbd.1/

[NDBD]
Id= 3
HostName= nanna16
DataDir= /export/home2/tmp/wx228566/ndbd.2/

[mysqld]
HostName=nanna14

[mysqld]
HostName=nanna14

Option to start master mysqld:
[mysqld]
server-id=2
skip-innodb
ndbcluster
skip-grant-tables
log-bin=nanna14-bin
binlog-format=row
ndb-connectstring=nanna14:16000
log-error=/export/home2/tmp/wx228566/ndb_mgmd.1/mysqld_err
socket= /export/home2/tmp/wx228566/mysqld-soc

3 Start replication

4 Add two data nodes to master cluster(edit master config file and restart the cluster)
add:
[NDBD]
Id= 4
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 5
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/

Change category into Server:Relication

Change category to Cluster: Replication

Pushed into 5.1.30-ndb-6.2.17 (revid:tomas.ulin@sun.com-20081216205149-pjqk3d3gnejvoqty) (version source revid:tomas.ulin@sun.com-20081216205149-pjqk3d3gnejvoqty) (merge vers: 5.1.30-ndb-6.2.17) (pib:6)

Pushed into 5.1.30-ndb-6.3.21 (revid:tomas.ulin@sun.com-20081216205451-2ifgf26u5m0luj9i) (version source revid:tomas.ulin@sun.com-20081216205451-2ifgf26u5m0luj9i) (merge vers: 5.1.30-ndb-6.3.21) (pib:6)

Pushed into 5.1.30-ndb-6.4.0 (revid:tomas.ulin@sun.com-20081216212155-qx4obe638b8vv6fi) (version source revid:tomas.ulin@sun.com-20081216212155-qx4obe638b8vv6fi) (merge vers: 5.1.30-ndb-6.4.0) (pib:6)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62087

2774 Tomas Ulin	2008-12-16
      Bug #41462  Mysqld/ndbapi disconnects too agressively during node restart

Pushed into 6.0.10-alpha (revid:alik@sun.com-20090210194937-s7xshv5l3m1v7wi9) (version source revid:tomas.ulin@sun.com-20090108115759-b4yhuwkm6w8tg7j3) (merge vers: 6.0.10-alpha) (pib:6)

Documented in the NDB-6.2.17, 6.3.21, and 6.4.0 changelogs as follows:

        API nodes disconnected too agressively from cluster when data
        nodes were being restarted. This could sometimes lead to the API
        node being unable to access the cluster at all during a rolling
        restart.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66011

2814 He Zhenxing	2009-02-12 [merge]
      Auto merge 6.0 -> 6.0-rpl