Bug #71394 Network reconnect on one node causes forced node shutdown on all nodes
Submitted: 15 Jan 2014 22:49 Modified: 4 Apr 2016 23:08
Reporter: Kevin Dyer Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.6.27 ndb-7.4.8 OS:Linux (Centos 64)
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: arbitration, cluster, forced node shutdown

[15 Jan 2014 22:49] Kevin Dyer
Description:
If the is a networking issue with Node 1 when it reconnects Node 2 loses the arbitrator, shown here:

  ndb_50_cluster.log

2014-01-14 02:28:46 [MgmtSrvr] ALERT – Node 50: Node 1 Disconnected
2014-01-14 02:28:46 [MgmtSrvr] INFO – Node 2: Communication to Node 102 closed
2014-01-14 02:28:46 [MgmtSrvr] INFO – Node 2: Communication to Node 103 closed
2014-01-14 02:28:46 [MgmtSrvr] INFO – Node 2: Communication to Node 106 closed
2014-01-14 02:28:46 [MgmtSrvr] INFO – Node 2: Communication to Node 107 closed
2014-01-14 02:28:46 [MgmtSrvr] ALERT – Node 2: Node 102 Disconnected
2014-01-14 02:28:46 [MgmtSrvr] ALERT – Node 2: Node 103 Disconnected
2014-01-14 02:28:46 [MgmtSrvr] ALERT – Node 2: Node 106 Disconnected
2014-01-14 02:28:46 [MgmtSrvr] ALERT – Node 2: Node 107 Disconnected
2014-01-14 02:28:47 [MgmtSrvr] ALERT – Node 50: Node 2 Disconnected
2014-01-14 02:28:47 [MgmtSrvr] INFO – Node 50: Node 2 Connected
2014-01-14 02:28:47 [MgmtSrvr] INFO – Node 2: Communication to Node 104 closed
2014-01-14 02:28:47 [MgmtSrvr] ALERT – Node 2: Node 104 Disconnected
2014-01-14 02:28:47 [MgmtSrvr] INFO – Node 50: Node 1 Connected
2014-01-14 02:28:47 [MgmtSrvr] INFO – Node 2: Started arbitrator node 52 [ticket=05990038020d98ee]

  ndb_2_out.log

2014-01-14 15:28:46 [ndbd] INFO – Lost arbitrator node 52 - process failure [state=6]
2014-01-14 15:28:46 [ndbd] INFO – President restarts arbitration thread [state=1]
2014-01-14 15:28:46 [ndbd] WARNING – Could not find an arbitrator, cluster is not partition-safe
2014-01-14 15:28:47 [ndbd] INFO – Started arbitrator node 52 [ticket=05990038020d98ee]

The cluster will fail when the node 2 disconnects from the management node and arbitration is required to avoid an unpartitioned cluster. Specifically if node 1 is disconnected for 15-20 seconds with the default timings.

How to repeat:
Assuming a fairly standard configuration
2 x Management nodes
2 x Data nodes
2 x API nodes (can repeated without these connected)

Disconnect a data node for around 15 seconds. Assuming a 5 second overhead in the network service, this will work:

date ; service network stop ; sleep 10 ; service network start ; date

  On data node 1:

Wed Jan 15 15:41:10 NZDT 2014
Shutting down interface eth0: [ OK ]
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0:
Determining IP information for eth0... done. [ OK ]
Wed Jan 15 15:41:26 NZDT 2014

Look at what happens to node 2 when node 1 comes back

  Management log /var/lib/mysql-cluster/ndb_52_cluster.log

2014-01-15 02:41:17 [MgmtSrvr] ALERT – Node 52: Node 1 Disconnected
2014-01-15 02:41:20 [MgmtSrvr] WARNING – Node 2: GCP Monitor: GCP_COMMIT lag 9 seconds (no max lag)
2014-01-15 02:41:21 [MgmtSrvr] WARNING – Node 2: Node 1 missed heartbeat 2
2014-01-15 02:41:26 [MgmtSrvr] INFO – Node 52: Node 1 Connected
2014-01-15 02:41:26 [MgmtSrvr] ALERT – Node 52: Node 2 Disconnected
2014-01-15 02:41:26 [MgmtSrvr] INFO – Node 52: Node 2 Connected
2014-01-15 02:41:30 [MgmtSrvr] WARNING – Node 2: GCP Monitor: GCP_COMMIT lag 19 seconds (no max lag)
2014-01-15 02:41:31 [MgmtSrvr] ALERT – Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2014-01-15 02:41:31 [MgmtSrvr] ALERT – Node 52: Node 2 Disconnected
2014-01-15 02:41:32 [MgmtSrvr] ALERT – Node 52: Node 1 Disconnected
2014-01-15 02:41:32 [MgmtSrvr] ALERT – Node 1: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

  Node 2 log /usr/local/mysql/data/ndb_2_out.log

2014-01-15 15:41:26 [ndbd] INFO – Lost arbitrator node 50 - process failure [state=6]
2014-01-15 15:41:26 [ndbd] INFO – President restarts arbitration thread [state=1]
2014-01-15 15:41:26 [ndbd] WARNING – Could not find an arbitrator, cluster is not partition-safe
2014-01-15 15:41:31 [ndbd] INFO – findNeighbours from: 4896 old (left: 1 right: 1) new (65535 65535)
2014-01-15 15:41:31 [ndbd] ALERT – Network partitioning - no arbitrator available
2014-01-15 15:41:31 [ndbd] INFO – President restarts arbitration thread [state=8]
2014-01-15 15:41:31 [ndbd] INFO – Arbitrator decided to shutdown this node
2014-01-15 15:41:31 [ndbd] INFO – QMGR (Line: 6056) 0x00000006
2014-01-15 15:41:31 [ndbd] INFO – Error handler shutting down system
2014-01-15 15:41:31 [ndbd] INFO – Error handler shutdown completed - exiting
2014-01-15 15:41:31 [ndbd] ALERT – Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

We have reproduced this numerous times on two systems
[28 Dec 2015 23:43] Edouard Kaiser
Hi Kevin,

We recently switched to MySQL Cluster in production and we had the exact same problem recently. It only happened once in a month, but for a cluster it's one too many times.

Did you find any workaround ? recommendations ?

Thanks for your help !
[10 Jan 2016 20:47] Kevin Dyer
Hi Edouard

Sorry for the slow reply. I have a few recommendations

1) consider if your application is suitable for another database technology. MySQL cluster is not very fault tolerant and really wants dedicated hardware, it is not cloud friendly

2) Use a dedicated network for the data nodes 
http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-tcp-definition-direct.html

3) (not actually this issue) avoid bulk operations, if you update say 100000 rows the mysql API nodes can saturate their network and restart in readonly mode. None of this shows in the logs, you just see an unexplained restart.

Unfortunately you should never let network connection go down. Paying customers may get better support from MySQL to resolve these issues.

Good luck

Kevin
[7 Mar 2016 6:27] MySQL Verification Team
Hi Kevin,

Sorry for the late reply but from your description you are using 2 servers to run your 4 nodes (2 data and 2 mgm), in this type of setup the behavior you are experiencing is normal. If you drop the network on a server that's running your arbitrator your cluster will shutdown; if you drop network on the other server (the one that's running mgm node that is not the arbitrator) your cluster will survive.

To battle this we allow SQL nodes to be also arbitrators so there's less chance that you will kill half of your cluster + arbitrator in the same time.

Running a cluster with just 2 servers is theoretically possible but not advisable, you really want your mgm nodes to be on a separate boxes from your data nodes, they don't take much resources and you can run them on your application server or on any other box you have in your system.

Wrt more advanced configuration you could do please contact our support service :)

kind regards
Bogdan Kecman
[7 Mar 2016 9:05] Edouard Kaiser
Hi Bogdan,

Thanks for your response.

I'm not sure about Kevin's configuration, but mine is composed of 3 servers:

2x Data node on 2 different servers
1x Management node on 1 server

And we still experienced the exact same problem.

Kind regards,
[7 Mar 2016 9:19] MySQL Verification Team
Hi Edouard,

If you are running 

1 mgm node
2 data nodes

and you don't have sql nodes configured as arbitrators your 1 mgm node is always arbitrator you have following scenarios

data node1 can see data node2 -> your cluster will survive
data node1 cannot see data node2 and cannot see mgm node -> node1 will shutdown (cannot form quorum)
data node2 cannot see data node1 and cannot see mgm node -> node2 will shutdown (cannot form quorum)
data node1 cannot see node2 but can see mgm node -> node 1 stays alive
data node2 cannot see node1 but can see mgm node -> node 2 stays alive

so I believe that here you understand that if any of two data nodes stays alive your cluster is alive. Only if both nodes shutdown cluster will not survive, so in case that both nodes don't see neither each other nor mgm node.

There is one more scenario where 
node1 can see mgm node but not node 2 AND node2 can see mgm node but not node 1 -> in this case arbitrator will decide what node to shutdown and your cluster will survive with one node

Not sure exactly which ones you are hitting nor how your network is setup but if you use single switch and that switch dies the cluster will fail. 

If you share bit more data about your setup we can investigate why you get the cluster shutdown but I doubt you found a bug here.

take care
Bogdan Kecman
[7 Mar 2016 9:19] MySQL Verification Team
Hi Edouard,

If you are running 

1 mgm node
2 data nodes

and you don't have sql nodes configured as arbitrators your 1 mgm node is always arbitrator you have following scenarios

data node1 can see data node2 -> your cluster will survive
data node1 cannot see data node2 and cannot see mgm node -> node1 will shutdown (cannot form quorum)
data node2 cannot see data node1 and cannot see mgm node -> node2 will shutdown (cannot form quorum)
data node1 cannot see node2 but can see mgm node -> node 1 stays alive
data node2 cannot see node1 but can see mgm node -> node 2 stays alive

so I believe that here you understand that if any of two data nodes stays alive your cluster is alive. Only if both nodes shutdown cluster will not survive, so in case that both nodes don't see neither each other nor mgm node.

There is one more scenario where 
node1 can see mgm node but not node 2 AND node2 can see mgm node but not node 1 -> in this case arbitrator will decide what node to shutdown and your cluster will survive with one node

Not sure exactly which ones you are hitting nor how your network is setup but if you use single switch and that switch dies the cluster will fail. 

If you share bit more data about your setup we can investigate why you get the cluster shutdown but I doubt you found a bug here.

take care
Bogdan Kecman
[7 Mar 2016 20:17] Kevin Dyer
Hi Bogdan

No, please re-read my original post. I mention 6 servers.

The 2 data nodes can see the 2 arbitrators with the exception of when I restart the network on 1 data node. I just repeated it exactly as per my instructions... and yes still an issue on 7.4.8.

Please reconsider your response

Kevin
[7 Mar 2016 21:16] MySQL Verification Team
Hi Kevin,

You wrote:

> 2 x Management nodes
> 2 x Data nodes
> 2 x API nodes (can repeated without these connected)

I cannot guess if this is 6 servers or 6 services on one server. The quite usual setup with these services is 2 servers each running 3 services each. Furthermore you write:

> Disconnect a data node for around 15 seconds...

> 2014-01-15 02:41:17 [MgmtSrvr] ALERT – Node 52: Node 1 Disconnected
> 2014-01-15 02:41:26 [MgmtSrvr] INFO – Node 52: Node 1 Connected
> 2014-01-15 02:41:26 [MgmtSrvr] ALERT – Node 52: Node 2 Disconnected
> 2014-01-15 02:41:26 [MgmtSrvr] INFO – Node 52: Node 2 Connected

I will assume node 1 and 2 are on same box

If you have 6 dedicated servers, each running a single mccge process then by removing network from a single data node you should not have a crash as long as you don't suffer from GCP stop problem that is related to miss-configuration of sizing of your cluster.

As for the snippet of your log

> Node 2 log /usr/local/mysql/data/ndb_2_out.log

shows that node2 lost communication with second data node so it understand it is "half a cluster" and is looking for arbitrator to decide if it is the side that should survive. Since arbitrator is not contactable the node commits suicide as per design.

I don't see the log from the node1 but looking at the snippet from mgm node it seems that the node1 had the same issue - arbitrator was not present. 

Now you say you shutdown network on node1, then it is expected for node1 to commit suicide as it is not able to see any other node. The question is why your node2 is not seeing the arbitrator. If both mgm nodes are up and connectable one should be arbitrator unless you missconfigured something. How does your config.ini looks like?

all best
Bogdan
[7 Mar 2016 23:35] Kevin Dyer
mgmt configuration

Attachment: config.ini (application/octet-stream, text), 3.32 KiB.

[7 Mar 2016 23:39] Kevin Dyer
Hi Bogdan

Thanks, now I see where the misunderstanding came from.

> I will assume node 1 and 2 are on same box
No, they are independent. Technically the sql and mgmt are shared but we agree this shouldn't matter.

> If you have 6 dedicated servers, each running a single mccge process then by removing network from a single data node you should not have a crash as long as you don't suffer from GCP stop problem that is related to miss-configuration of sizing of your cluster.
I don't know enough to comment on this.

> The question is why your node2 is not seeing the arbitrator. If both mgm nodes are up and connectable one should be arbitrator unless you missconfigured something. 
Yes, this is the question. Please note if data node 1 did not reconnect quickly this problem does not occur (example below).

I repeated this and here are the new logs:

data node 1 - as expected we know it cannot talk to anything while the network is down

[root@test-ndb1 ~]# date ; service network stop ; sleep 10 ; service network start ; date
Network down: Mon Mar  7 20:07:32 UTC 2016
2016-03-07 20:07:39 [ndbd] INFO     -- Lost arbitrator node 51 - process failure [state=6]
Network up: Mon Mar  7 20:07:52 UTC 2016
2016-03-07 20:07:56 [ndbd] INFO     -- findNeighbours from: 5073 old (left: 2 right: 2) new (65535 65535)
2016-03-07 20:07:56 [ndbd] ALERT    -- Network partitioning - no arbitrator available
2016-03-07 20:07:56 [ndbd] INFO     -- President restarts arbitration thread [state=8]
2016-03-07 20:07:56 [ndbd] INFO     -- Arbitrator decided to shutdown this node
2016-03-07 20:07:56 [ndbd] INFO     -- QMGR (Line: 6239) 0x00000002
2016-03-07 20:07:56 [ndbd] INFO     -- Error handler shutting down system
2016-03-07 20:07:56 [ndbd] INFO     -- Error handler shutdown completed - exiting
2016-03-07 20:07:57 [ndbd] ALERT    -- Node 1: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

data node 2 - no reason to lose arbitrator 51

2016-03-07 20:07:52 [ndbd] INFO     -- Lost arbitrator node 51 - process failure [state=6]
2016-03-07 20:07:52 [ndbd] INFO     -- President restarts arbitration thread [state=1]
2016-03-07 20:07:56 [ndbd] INFO     -- findNeighbours from: 5073 old (left: 1 right: 1) new (65535 65535)
2016-03-07 20:07:56 [ndbd] ALERT    -- Network partitioning - no arbitrator available
2016-03-07 20:07:56 [ndbd] INFO     -- President restarts arbitration thread [state=8]
2016-03-07 20:07:56 [ndbd] INFO     -- Arbitrator decided to shutdown this node
2016-03-07 20:07:56 [ndbd] INFO     -- QMGR (Line: 6239) 0x00000002
2016-03-07 20:07:56 [ndbd] INFO     -- Error handler shutting down system
2016-03-07 20:07:56 [ndbd] INFO     -- Error handler shutdown completed - exiting
2016-03-07 20:07:57 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.

> How does your config.ini looks like?
/var/lib/mysql-cluster/config.ini attached

Please note the "Direct TCP connections between data nodes" is not standard but was only added in November 2014 so it shouldn't affect this bug.

Hope this information helps

Kevin
[7 Mar 2016 23:39] Kevin Dyer
Comment was too long, additional logs:

mgmt node 50 - no reason for node 2 to disconnect just because node 1 reconnected

2016-03-07 20:07:39 [MgmtSrvr] ALERT    -- Node 50: Node 1 Disconnected
2016-03-07 20:07:43 [MgmtSrvr] WARNING  -- Node 2: GCP Monitor: GCP_COMMIT lag 10 seconds (no max lag)
2016-03-07 20:07:45 [MgmtSrvr] WARNING  -- Node 2: Node 1 missed heartbeat 2
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 50: Node 1 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 101: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 103 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 103: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 105 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 105: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 109 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 109: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 107 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 107: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] WARNING  -- Node 2: Node 1 missed heartbeat 3
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 108 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 108: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 102 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 106 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 106: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 102: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:51 [MgmtSrvr] WARNING  -- Node 1: Node 2 missed heartbeat 3
2016-03-07 20:07:52 [MgmtSrvr] ALERT    -- Node 50: Node 2 Disconnected
2016-03-07 20:07:52 [MgmtSrvr] INFO     -- Node 50: Node 2 Connected
2016-03-07 20:07:53 [MgmtSrvr] WARNING  -- Node 2: GCP Monitor: GCP_COMMIT lag 20 seconds (no max lag)
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 50: Node 1 Disconnected
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 1: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 50: Node 2 Disconnected

mgmt node 51 - again, no reason for node 2 to disconnect just because node 1 reconnected

2016-03-07 20:07:39 [MgmtSrvr] ALERT    -- Node 51: Node 1 Disconnected
2016-03-07 20:07:43 [MgmtSrvr] WARNING  -- Node 2: GCP Monitor: GCP_COMMIT lag 10 seconds (no max lag)
2016-03-07 20:07:45 [MgmtSrvr] WARNING  -- Node 2: Node 1 missed heartbeat 2
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 51: Node 1 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 101 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 101: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 103 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 103: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 105 Connected
2016-03-07 20:07:49 [MgmtSrvr] INFO     -- Node 1: Node 105: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 109 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 109: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 107 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 107: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] WARNING  -- Node 2: Node 1 missed heartbeat 3
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 108 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 108: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 102 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 106 Connected
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 106: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:50 [MgmtSrvr] INFO     -- Node 1: Node 102: API mysql-5.6.27 ndb-7.4.8
2016-03-07 20:07:51 [MgmtSrvr] WARNING  -- Node 1: Node 2 missed heartbeat 3
2016-03-07 20:07:52 [MgmtSrvr] ALERT    -- Node 51: Node 2 Disconnected
2016-03-07 20:07:52 [MgmtSrvr] INFO     -- Node 51: Node 2 Connected
2016-03-07 20:07:53 [MgmtSrvr] WARNING  -- Node 2: GCP Monitor: GCP_COMMIT lag 20 seconds (no max lag)
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 51: Node 1 Disconnected
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 1: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 51: Node 2 Disconnected
2016-03-07 20:07:57 [MgmtSrvr] ALERT    -- Node 2: Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and can not form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
[7 Mar 2016 23:39] Kevin Dyer
And if data node 1 did not reconnect so fast you see this:

data node 2 - because data node 1 did not reconnect, arbitration succeeds and node fail handling succeeds

2016-03-07 23:21:35 [ndbd] INFO     -- findNeighbours from: 5073 old (left: 1 right: 1) new (65535 65535)
2016-03-07 23:21:35 [ndbd] ALERT    -- Network partitioning - arbitration required
2016-03-07 23:21:35 [ndbd] INFO     -- President restarts arbitration thread [state=7]
2016-03-07 23:21:35 [ndbd] ALERT    -- Arbitration won - positive reply from node 50
2016-03-07 23:21:35 [ndbd] INFO     -- NR Status: node=1,OLD=Initial state,NEW=Node failed, fail handling ongoing
start_resend(0, min: 27031128/7 - max: 27031128/8) page: 106859
2016-03-07 23:21:35 [ndbd] INFO     -- DBTC instance 0: Starting take over of node 1
execGCP_NOMORETRANS(27031128/9) c_ongoing_take_over_cnt -> seize
2016-03-07 23:21:35 [ndbd] WARNING  -- ACK wo/ gcp record (gci: 27031128/7) ref: 0fa2006c from: 0fa2006c
2016-03-07 23:21:35 [ndbd] WARNING  -- ACK wo/ gcp record (gci: 27031128/8) ref: 0fa2006c from: 0fa2006c
2016-03-07 23:21:35 [ndbd] WARNING  -- ACK wo/ gcp record (gci: 27031128/7) ref: 0fa20065 from: 0fa20065
2016-03-07 23:21:35 [ndbd] WARNING  -- ACK wo/ gcp record (gci: 27031128/8) ref: 0fa20065 from: 0fa20065
2016-03-07 23:21:35 [ndbd] INFO     -- DBTC instance 0: Completed take over of failed node 1
completing gcp 27031128/9 in execTAKE_OVERTCCONF
27031128/9 (27031128/8) switchover complete bucket 0 state: c
takeover 106859
resend done...
2016-03-07 23:21:35 [ndbd] INFO     -- Started arbitrator node 50 [ticket=23d60002150d3ba6]
2016-03-07 23:21:36 [ndbd] INFO     -- Adjusting disk write speed bounds due to : Node restart ongoing
2016-03-07 23:21:44 [ndbd] INFO     -- NR Status: node=1,OLD=Node failed, fail handling ongoing,NEW=Node failure handling complete
2016-03-07 23:21:44 [ndbd] INFO     -- Node 1 has completed node fail handling
[8 Mar 2016 4:41] MySQL Verification Team
Hi Kevin,

Great that we are moving this forward. Thanks for cooperation!

> Technically the sql and mgmt are shared but we agree this shouldn't matter.

Actually since SQL node can work as arbitrator in this particular case it could matter but was not part of misunderstanding for sure.

>> .. GCP stop problem that is related to miss-configuration of sizing of your cluster.
> I don't know enough to comment on this.

Ok lemme elaborate as this is one that is often wrongly attributed to different bugs while it is part of the design and happens due to poorly sized mccge. MCCGE is designed to be a real-time database and in real-time world you have to know how much things last and depend on it so mccge will, if not capable of finishing "things" in allotted time, rather crash then break the real-time nature of the system.

This crash, that happens when mccge is undersized is seen in the logs as GCP STOP. 

When you write to your mccge (writes are not only inserts but any data changing operation like delete or update) this data goes in to the redo log. If a node crash it will be able to replay these changes from a redo log while starting from last good LCP. This redo log needs to consistent on all data nodes and this synchronization is done via GCP. GCP process synchronously writes data from a redo log to a disk. This GCP event by default happens every 2 seconds (configurable). So what happens when new GCP event needs to start (2sec passed since the previous one started) and the old one is not finished (too much data to be written in too little time, disk io used by something else, virtual server instance and some other vm ate your resources ...) is that mccge decides to crash the node having troubles flushing redo data to disk in allotted time with GCP STOP. This design decision and real-time nature of mccge is a reason why it's not really advisable running mccge data nodes as VM. 

So, while this is by design in-memory database it does sync all the data to disk and you have to have disk io capable of catching up with all the data writes you make. There is lot more one could say about GCP stop, for e.g. even if you have a low write rate in average you can easily hit GCP STOP by running a long transaction .. the transaction will be flushed to disk on commit, you can start a transaction and add to it for hours slowly and then on commit you will have huge amount of data that needs to be flushed in very small amount of time -> GCP STOP.

Some configuration settings to look at:
 - TimeBetweenGlobalCheckpoints ( if decreased causes the data node to GCP more often and this helps with lot of small transactions; or if increased gives GCP longer time to finish that will help if you have long/big transactions)
 - DiskPageBufferMemory ( increasing it will buffer more disk data decreasing disk io used by disk data so that the GCP can use more dis io)
 
 In 7.4 we relaxed this GCP stop behavior a bit (it will try to abord transactions and throw temporary error trying to finish not crashing a data node) but GCP stop is still here and is here to stay.
 
 Now how this GCP stop can be seen in light of a network outage. The database load is spread between data nodes. You have two of them. When you disconnect one from the system (for whatever reason, stopping network is one of them) as soon it's declared dead all traffic now goes to the other data node so effectively your surviving data node is now 200% loaded. This can easily lead to GCP stop in next few seconds. In order to test this you have to look at crash reports (GCP STOP is easy to spot) and when you are doing this type of failure tests you turn off any load from your database (if there's no load on a database the GCP STOP can't happen).
 
 Hope this clears up the GCP STOP hint I gave. Lemme now look trough the rest of the data you provided.
 
 kind regards
 Bogdan Kecman
[8 Mar 2016 4:58] MySQL Verification Team
Hi Kevin,

few url's wrt configuring arbitration (just for the record):

https://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-ndbinfo-arbitrator-va...

https://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-a...

https://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-mgm-definition.html#ndbparam-mgmd-ar...

> Please note if data node 1 did not reconnect quickly this problem does not occur (example below).

This is not something I can easily reproduce. So you are experiencing situation that on your setup with 2data and 2 mgm nodes (all physical) if your data node network is killed for long enough to miss 3hb's but not long enough for cluster to come to a stable state the remaining data node will also shutdown.

> Hope this information helps

It does but I'm still not able to reproduce this locally.
Can you fetch the https://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-ndbinfo-arbitrator-va... content before and during the network stop so that we can see "normal" state pre test and what both nodes see just before the shutdown. Maybe add one sql node to the physical node1 server so that you can exec select from that sql node when you kill the network on node1 server.

all best
Bogdan Kecman
[8 Mar 2016 5:31] Kevin Dyer
Hi Bogdan

Very useful information. One quick response to your GCP explanation is that this database is on a test system doing virtually nothing.

> This is not something I can easily reproduce. So you are experiencing situation that on your setup with 2data and 2 mgm nodes (all physical) if your data node network is killed for long enough to miss 3hb's but not long enough for cluster to come to a stable state the remaining data node will also shutdown.

Yes exactly.

Do you have a system you can try cycling the network interface? service network stop ; sleep 10 ; service network start

I outputted the contents of that table while reproducing the issue and got this:

now()                node_id arbitrator   arb_ticket      arb_connected   arb_state
-----------------------------------------------------------------------------------
2016-03-08 05:15:37     1       50      23d60002150d3ba6        Yes     ARBIT_RUN
2016-03-08 05:15:37     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:38     1       50      23d60002150d3ba6        Yes     ARBIT_RUN
2016-03-08 05:15:38     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:39     1       50      23d60002150d3ba6        Yes     ARBIT_RUN
2016-03-08 05:15:39     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:40     1       50      23d60002150d3ba6        Yes     ARBIT_RUN
2016-03-08 05:15:40     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:41     1       50      23d60002150d3ba6        Yes     ARBIT_RUN
2016-03-08 05:15:41     2       50      23d60002150d3ba6        Yes     ARBIT_RUN
ERROR 1296 (HY000) at line 1: Got error 157 'Connection to NDB failed' from NDBINFO
Error (Code 1296): Got error 157 'Connection to NDB failed' from NDBINFO
Error (Code 1028): Sort aborted: Got error 157 'Connection to NDB failed' from NDBINFO

2016-03-08 05:15:48     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:49     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:50     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:51     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:52     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:53     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:54     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:55     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:56     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:57     1       0       23d60002150d3ba6        No      ARBIT_NULL
2016-03-08 05:15:57     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:58     1       0       23d60002150d3ba6        No      ARBIT_NULL
2016-03-08 05:15:58     2       50      23d60002150d3ba6        Yes     ARBIT_RUN

2016-03-08 05:15:59     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:00     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:01     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:02     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:03     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:04     1       0       23d60002150d3ba6        No      ARBIT_NULL

2016-03-08 05:16:05     1       0       23d60002150d3ba6        No      ARBIT_NULL
ERROR 1296 (HY000) at line 1: Got error 157 'Connection to NDB failed' from NDBINFO
Error (Code 1296): Got error 157 'Connection to NDB failed' from NDBINFO
Error (Code 1028): Sort aborted: Got error 157 'Connection to NDB failed' from NDBINFO
ERROR 1296 (HY000) at line 1: Got error 157 'Connection to NDB failed' from NDBINFO
ERROR 1296 (HY000) at line 1: Got error 157 'Connection to NDB failed' from NDBINFO

Does that help?

Cheers

Kevin
[8 Mar 2016 7:11] MySQL Verification Team
Hi Kevin,

many thanks for all the info. Lemme retry the tests with 7.4.8 (I'm actually testing with 7.4.10 attm) maybe it will make a difference (testing both on VM setup and with bare metal)

all best
Bogdan Kecman
[4 Apr 2016 11:56] MySQL Verification Team
This is bit weird, I spent too much time on this bug without satisfying solution. Doing any "normal" testing, unplugging cable, removing switch, powering off a server, removing power from server, killing the ndbd/ndbmtd process etc - I cannot reproduce this problem. So in any real life scenario this works ok.

Now flapping the network interface I could crash the cluster in a way that is explained here but that IMO does not qualify as a bug. I will have to discuss this with my colleagues from cluster dev team if this tricking the cluster into wrong state by deliberately turning interface on/off can be considered a bug or not.

kind regards
Bogdan Kecman
[4 Apr 2016 23:08] Kevin Dyer
Hi Bogdan

Yeah it is a strange one. I see no reason for the data node to disconnect from the arbitrator just because another data node reconnected quickly. Resolving that resolves this issue.

I think you are missing a range of real life scenarios that cause this issue. Server maintenance tasks that include updating OS packages (security patches etc) could easily lead to a scenario that requires a soft network restart... if that takes a few seconds you could lose your cluster. If you run modern configuration management tools they may automatically apply updates. Also a switch could go down for 10 seconds.

THB I don't understand why mysql cluster will shutdown immediately and purge it's memory when arbitration fails the first time, why not disconnect all clients and poll for arbitrators so it can come back to life quickly. Assuming the restoration process is the last phase of coming back up, if it is earlier that wouldn't work. Our cluster can take over 40 minutes to come back up. That is another issue however...

Sounds like you have reproduced it, let's see what the dev team think.

Thanks

Kevin