Bug #41462 Mysqld/ndbapi disconnects too agressively during node restart
Submitted: 15 Dec 2008 8:57 Modified: 11 Feb 2009 16:52
Reporter: Wen Xiong Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:cluster 6.2 OS:Solaris
Assigned to: CPU Architecture:Any

[15 Dec 2008 8:57] Wen Xiong
Description:
I can not insert data from the master server to tables as well as any other queries while adding nodes to master cluster during replication. As a result, no data can be replicated during this time.

To perform adding nodes to master cluster, after editing the config file with two more data nodes, we need to restart ndb_mgmd and ndbd nodes. After the restart of "old" ndbd nodes,  they know that the "new" ndbd nodes are coming(The "new" data nodes are not started at the moment). As a result, the data inserted to the table will be distributed on all the data nodes after we restart the cluster. During this time, new data can not be inserted to the table through mysql client until all the data nodes are "started" including the "new". Otherwise, any query will cause the following error messages:
*
ERROR 1033 (HY000): Incorrect information in file: './DB1/t3.frm'
*
t3 is the table I created and inserted with data. So, no data is replicated during the process although the replication is still kept alive. 

How to repeat:
1. Configuration for master cluster:
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 200M
IndexMemory= 50M
BackupMemory= 100M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages = 1000
#SendBufferMemory = 2M
NoOfFragmentLogFiles = 32
FragmentLogFileSize = 64M
TimeBetweenLocalCheckpoints=20
CompressedLCP = 1
CompressedBackup = 1
ODirect =1

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna13
PortNumber= 16000
DataDir= /export/home/tmp/wx228566/ndb_mgmd.1/

# Data Nodes
[NDBD]
Id= 2
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 3
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/

[mysqld]
HostName=nanna13

[mysqld]
HostName=nanna13

Option to start master mysqld:
[mysqld]
server-id=1
skip-innodb
ndbcluster
skip-grant-tables
log-bin=nanna13-bin
binlog-format=row
ndb-connectstring=nanna13:16000
log-error=/export/home/tmp/wx228566/ndb_mgmd.1/mysqld_err
socket= /export/home/tmp/wx228566/mysqld-soc

2. Configuration for slave cluster:
[NDBD DEFAULT]
NoOfReplicas= 2
DataMemory= 200M
IndexMemory= 50M
BackupMemory= 100M
MaxNoOfConcurrentScans = 100
MaxNoOfSavedMessages = 1000
#SendBufferMemory = 2M
NoOfFragmentLogFiles = 32
FragmentLogFileSize = 64M
TimeBetweenLocalCheckpoints=20
CompressedLCP = 1
CompressedBackup = 1
ODirect =1

# Management node
[NDB_MGMD]
Id= 1
HostName= nanna14
PortNumber= 16000
DataDir= /export/home2/tmp/wx228566/ndb_mgmd.1/

# Data Nodes
[NDBD]
Id= 2
HostName= nanna15
DataDir= /export/home2/tmp/wx228566/ndbd.1/

[NDBD]
Id= 3
HostName= nanna16
DataDir= /export/home2/tmp/wx228566/ndbd.2/

[mysqld]
HostName=nanna14

[mysqld]
HostName=nanna14

Option to start master mysqld:
[mysqld]
server-id=2
skip-innodb
ndbcluster
skip-grant-tables
log-bin=nanna14-bin
binlog-format=row
ndb-connectstring=nanna14:16000
log-error=/export/home2/tmp/wx228566/ndb_mgmd.1/mysqld_err
socket= /export/home2/tmp/wx228566/mysqld-soc

3 Start replication

4 Add two data nodes to master cluster(edit master config file and restart the cluster)
add:
[NDBD]
Id= 4
HostName= nanna15
DataDir= /export/home/tmp/wx228566/ndbd.1/

[NDBD]
Id= 5
HostName= nanna16
DataDir= /export/home/tmp/wx228566/ndbd.2/
[15 Dec 2008 11:24] Wen Xiong
Change category into Server:Relication
[15 Dec 2008 11:32] Wen Xiong
Change category to Cluster: Replication
[16 Dec 2008 20:55] Bugs System
Pushed into 5.1.30-ndb-6.2.17 (revid:tomas.ulin@sun.com-20081216205149-pjqk3d3gnejvoqty) (version source revid:tomas.ulin@sun.com-20081216205149-pjqk3d3gnejvoqty) (merge vers: 5.1.30-ndb-6.2.17) (pib:6)
[16 Dec 2008 20:56] Bugs System
Pushed into 5.1.30-ndb-6.3.21 (revid:tomas.ulin@sun.com-20081216205451-2ifgf26u5m0luj9i) (version source revid:tomas.ulin@sun.com-20081216205451-2ifgf26u5m0luj9i) (merge vers: 5.1.30-ndb-6.3.21) (pib:6)
[16 Dec 2008 21:26] Bugs System
Pushed into 5.1.30-ndb-6.4.0 (revid:tomas.ulin@sun.com-20081216212155-qx4obe638b8vv6fi) (version source revid:tomas.ulin@sun.com-20081216212155-qx4obe638b8vv6fi) (merge vers: 5.1.30-ndb-6.4.0) (pib:6)
[19 Dec 2008 13:11] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/62087

2774 Tomas Ulin	2008-12-16
      Bug #41462  Mysqld/ndbapi disconnects too agressively during node restart
[10 Feb 2009 20:14] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090210194937-s7xshv5l3m1v7wi9) (version source revid:tomas.ulin@sun.com-20090108115759-b4yhuwkm6w8tg7j3) (merge vers: 6.0.10-alpha) (pib:6)
[11 Feb 2009 16:52] Jon Stephens
Documented in the NDB-6.2.17, 6.3.21, and 6.4.0 changelogs as follows:

        API nodes disconnected too agressively from cluster when data
        nodes were being restarted. This could sometimes lead to the API
        node being unable to access the cluster at all during a rolling
        restart.
[12 Feb 2009 10:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/66011

2814 He Zhenxing	2009-02-12 [merge]
      Auto merge 6.0 -> 6.0-rpl