MySQL Bugs: #66143: Error in geo cluster

Bug #66143	Error in geo cluster
Submitted:	1 Aug 2012 13:16	Modified:	8 Jun 2018 12:20
Reporter:	Aastha Gupta	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster Manager	Severity:	S2 (Serious)
Version:	7.2	OS:	Windows
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
We have geo cluster set up at two different locations.

First location :

1 management node
1 sql nodes
2 data nodes

Second location 

1 management node
1 sql nodes
2 data nodes

There was a power outage at location 1. Ideally, the location 2 should still have been up and the database running. However, the cluster went down. 

The error from event manager :

ndb_mgm> Node 12: Forced node shutdown completed. Caused by error 2305: 'Node lo
st connection to other nodes and can not form a unpartitioned cluster, please in
vestigate if there are error(s) on other node(s)(Arbitration error). Temporary e
rror, restart node'.

Kindly let me know for more information.

Thanks!
Aastha Gupta

How to repeat:
Switch off the geo cluster at any of the locations

Please, send configuration files for cluster on both nodes.

After the failure we are running the cluster from location only.

Aastha

Most likely reason: location 1 had the active arbitrator, so when location lost power nodes at location 2 didn't have a majority vote -> split brain situation -> shutdown 

When having a setup where half of the data nodes can fail simultaneously (power failure of a complete site, rack, machine, or link failure between two halves of a cluster) you need an extra dedicated arbitrator that does not share any critical resources with the rest of the cluster, in your case at a third site that has independent links to both the data centers where the halves of your cluster resided.

It is a bit of a shame IMHO that 

  http://www.mysql.com/products/cluster/faq.html#11

(or the FAQ in general) doesn't mention this ...

Wonder if was applied what Harmut commented.

I still think the FAQ entry is not really correct:

"11. How many physical servers are needed to create a minimum Cluster configuration?

A: For evaluation and evelopment purposes, you can run all nodes on a single host. For full redundancy and fault tolerance, you would need a minimum 6 x physical hosts:

    2 x data nodes
    2 x SQL/NoSQL Application Nodes
    2 x Management Nodes

Many users co-locate the Management and Application nodes which reduces the number of nodes to four."

I'd still say the minimum is three. Data, Management, and API nodes can co-exist on the same machine just fine, so in that respect the minimum number would be two. The FAQ entry totally misses the "need an odd number to prevent split-brain problems" aspect, and suggests even numbers of machines only.

Thank you for the feedback Harmut.

hey big H :D

well depends how you define the system, I know (and you do too) production systems on single hw instance, or even multiple production systems on single HW instance... so, the minimum is 1

The odd/even thing makes sense with InnoDB Cluster but with NDB Cluster even number of nodes will work, in reality it's only arbitrator that is relevant for the even/odd but since we do pretty much know how to recover from split brain even with even number of arbitrators .. 2 can work, and you can always run setup with single arbitrator (my favorite setup :D ).. 

Anyhow I don't see original report as a bug

kind regards
Bogdan

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".