Bug #68158 Documentation incorrect for rolling restart w/ multiple ndb_mgmd
Submitted: 23 Jan 2013 18:56 Modified: 31 Jan 2013 16:23
Reporter: Kolbe Kegel Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:All OS:Any
Assigned to: Jon Stephens CPU Architecture:Any
Tags: cluster, documentation

[23 Jan 2013 18:56] Kolbe Kegel
Description:
http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-rolling-restart.html#mysql-cluster-ro... states as step 3 "Start a single ndb_mgmd with --reload, --initial, or both options as desired" and then as step 4 "Start any remaining ndb_mgmd processes without using either of the --reload or --initial options".

In fact, if you restart the first ndb_mgmd with --initial, you must also start all other ndb_mgmds with --initial. Otherwise, when you start the 2nd ndb_mgmd, the 1st will shut down:

2013-01-22 13:54:23 [MgmtSrvr] INFO     -- Got initial configuration from '/home/kolbe/config.ini', will try to set it when all ndb_mgmd(s) started
2013-01-22 13:54:24 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2013-01-22 13:54:24 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2013-01-22 13:54:24 [MgmtSrvr] INFO     -- MySQL Cluster Management Server mysql-5.5.25 ndb-7.2.7 started
2013-01-22 13:54:24 [MgmtSrvr] INFO     -- Node 1 connected
2013-01-22 13:55:48 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2013-01-22 13:55:48 [MgmtSrvr] INFO     -- Node 2 connected
2013-01-22 13:55:48 [MgmtSrvr] WARNING  -- Refusing CONGIG_CHECK_REQ from 2,   it's not CS_INITIAL (I am).  Waiting for my check
2013-01-22 13:55:48 [MgmtSrvr] ERROR    -- This node was started --initial with a config which is _not_ equal to the one node 2 is using. Refusing to start with different configurations, diff: 
[ndb_mgmd(MGM)]
NodeId=1
-DataDir=/home/kolbe/mysql/cluster-logs
+DataDir=/home/kolbe/mysql/cluster-data

[ndb_mgmd(MGM)]
NodeId=2
-DataDir=/home/kolbe/mysql/cluster-logs
+DataDir=/home/kolbe/mysql/cluster-data

How to repeat:
Set up cluster with two management nodes
Start both normally (without --initial or --reload)
Shut down both ndb_mgmd nodes
Modify configuration on node 1
Start first ndb_mgmd with --initial
Start second ndb_mgmd without either --initial or --reload

First ndb_mgmd will shut down

Suggested fix:
Documentation should reflect that --initial on one ndb_mgmd requires --initial on all other ndb_mgmds.

When using --reload, additional ndb_mgmds should be started without any options, as the documentation currently states.
[24 Jan 2013 11:59] MySQL Verification Team
Hello Kolbe,

Thank you for the report.
I'm able to reproduce this behavior as described in bug#68158.

Alternatively, when I start first ndb_mgmd with --initial and --reload, in that case second ndb_mgmd comes up without using either of the --reload or --initial options and also doesn't affect first ndb_mgmd..

Attaching the test case details.
[24 Jan 2013 12:00] MySQL Verification Team
Test case 68158

Attachment: 68158.txt (text/plain), 20.03 KiB.

[24 Jan 2013 16:51] Kolbe Kegel
I'm not seeing the same thing as you with --initial --reload in 7.2.7...

kolbe@clust01 (30.201) mysql-cluster-gpl-7.2.7-linux2.6-x86_64 $ ./bin/ndb_mgmd -f ~/config.ini --initial --reload
MySQL Cluster Management Server mysql-5.5.25 ndb-7.2.7
2013-01-24 08:49:41 [MgmtSrvr] WARNING  -- at line 53: Cluster configuration warning:
  arbitrator with id 1 and db node with id 11 on same host 192.168.30.201
  arbitrator with id 2 and db node with id 12 on same host 192.168.30.202
  Running arbitrator on the same host as a database node may
  cause complete cluster shutdown in case of host failure.

kolbe@clust02 (30.202) mysql-cluster-gpl-7.2.7-linux2.6-x86_64 $ ./bin/ndb_mgmd --ndb-connectstring=192.168.30.201 
MySQL Cluster Management Server mysql-5.5.25 ndb-7.2.7

kolbe@clust01 (30.201) mysql-cluster-gpl-7.2.7-linux2.6-x86_64 $ tail -f ../cluster-data/ndb_1_cluster.log 
2013-01-24 08:49:16 [MgmtSrvr] INFO     -- Node 12: Communication to Node 2 opened
2013-01-24 08:49:21 [MgmtSrvr] WARNING  -- Node 11: GCP Monitor: GCP_COMMIT lag 20 seconds (no max lag)
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Got initial configuration from '/home/kolbe/config.ini', will try to set it when all ndb_mgmd(s) started
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Id: 1, Command port: *:1186
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- MySQL Cluster Management Server mysql-5.5.25 ndb-7.2.7 started
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Node 1: Node 1 Connected
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Node 1 connected
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Node 1: Node 11 Connected
2013-01-24 08:49:41 [MgmtSrvr] INFO     -- Node 1: Node 12 Connected
2013-01-24 08:49:42 [MgmtSrvr] INFO     -- Node 11: Started arbitrator node 1 [ticket=41ee000700f39d14]
2013-01-24 08:49:45 [MgmtSrvr] WARNING  -- Node 11: GCP Monitor: GCP_COMMIT lag 10 seconds (no max lag)
2013-01-24 08:49:50 [MgmtSrvr] INFO     -- Node 1: Node 2 Connected
2013-01-24 08:49:50 [MgmtSrvr] WARNING  -- Refusing CONGIG_CHECK_REQ from 2,   it's not CS_INITIAL (I am).  Waiting for my check
2013-01-24 08:49:50 [MgmtSrvr] WARNING  -- Refusing CONGIG_CHECK_REQ from 2,   it's not CS_INITIAL (I am).  Waiting for my check - Repeated 29 times
2013-01-24 08:49:50 [MgmtSrvr] INFO     -- Node 2 connected
2013-01-24 08:49:50 [MgmtSrvr] WARNING  -- Refusing CONGIG_CHECK_REQ from 2,   it's not CS_INITIAL (I am).  Waiting for my check
2013-01-24 08:49:50 [MgmtSrvr] ERROR    -- This node was started --initial with a config which is _not_ equal to the one node 2 is using. Refusing to start with different configurations, diff: 
[ndbd(DB)]
NodeId=11
-FragmentLogFileSize=29360128
+FragmentLogFileSize=30408704

[ndbd(DB)]
NodeId=12
-FragmentLogFileSize=29360128
+FragmentLogFileSize=30408704
[31 Jan 2013 16:23] Jon Stephens
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.
[31 Jan 2013 16:23] Jon Stephens
Fixed in mysqldoc rev 34106. Thanks for the info.