Bug #83893 StartPartitionedTimeout = 0 does not block indefinitely
Submitted: 20 Nov 2016 0:56 Modified: 17 May 2021 19:32
Reporter: Jesper wisborg Krogh Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version: OS:Any
Assigned to: CPU Architecture:Any

[20 Nov 2016 0:56] Jesper wisborg Krogh
Description:
According to https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-s... setting StartPartitionedTimeout = 0 will make a starting data node wait indefinitely to avoid a partitioned restart. However, in practice 0 means no wait.

How to repeat:
1. Configure a cluster with two data nodes and NoOfReplicas = 2 with StartPartitionedTimeout = 0
2. Stop all data nodes
3. Start one of the data nodes

From the cluster log:

...
2016-11-20 11:43:30 [MgmtSrvr] INFO     -- Node 49: Node 1 Connected
2016-11-20 11:43:33 [MgmtSrvr] INFO     -- Node 1: Waiting 27 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:36 [MgmtSrvr] INFO     -- Node 1: Waiting 24 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:39 [MgmtSrvr] INFO     -- Node 1: Waiting 21 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:42 [MgmtSrvr] INFO     -- Node 1: Waiting 18 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:45 [MgmtSrvr] INFO     -- Node 1: Waiting 15 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:48 [MgmtSrvr] INFO     -- Node 1: Waiting 12 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:51 [MgmtSrvr] INFO     -- Node 1: Waiting 9 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:54 [MgmtSrvr] INFO     -- Node 1: Waiting 6 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:43:57 [MgmtSrvr] INFO     -- Node 1: Waiting 3 sec for nodes 2 to connect, nodes [ all: 1 and 2 connected: 1 no-wait:  ]
2016-11-20 11:44:00 [MgmtSrvr] INFO     -- Node 1: Start potentially partitioned with nodes 1  [ missing: 2 no-wait:  ]
2016-11-20 11:44:00 [MgmtSrvr] INFO     -- Node 1: CM_REGCONF president = 1, own Node = 1, our dynamic id = 0/1
2016-11-20 11:44:00 [MgmtSrvr] INFO     -- Node 1: Start phase 1 completed
...

So it waits for 30 seconds to avoid a partial start, but then immediately starts a partitioned start.

Suggested fix:
Change so 0 means indefinite according to the documentation.
[20 Nov 2016 1:17] Jesper wisborg Krogh
Any value above StartPartitionedTimeout = 4294962295 also skips waiting to avoid a partitioned startup. So the workaround is to use StartPartitionedTimeout = 4294962295 which will delay the partitioned startup by 49.7 days.
[17 May 2021 19:32] Jon Stephens
This was fixed & documented in NDB 7.6.4; see https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-s....

Closed.