Bug #36762 Unable to make a new node sync to NDB (Stuck on phase 100)
Submitted: 16 May 2008 18:40 Modified: 30 Oct 2009 14:00
Reporter: Jeffrey R Email Updates:
Status: Not a Bug Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1 OS:Any (stuck on phase 100 due to firewall settings)
Assigned to: CPU Architecture:Any
Tags: 5.1.22, firewall, intitial, ndb, network, phase 100

[16 May 2008 18:40] Jeffrey R
While trying to bring up an entirely new data node with a fresh operating system and making it sync to a master data node I ran into some issues with completing the ndbd --initial sync.

The sync happened for the most part. All of the diskdata was copied (many GBs worth) and everything seemed to be okay except that it would not go past Phase 100, which is now a DEPRECATED state. I left it on this stage for hours to see if it would ever go beyond it, but to no avail it just stayed there.

I was able to resolve this issue with the culprit being the firewall settings which blocks everything but SSH by default.

How to repeat:
Setup a cluster with 1 mgmt node, 1 api node and 2 data nodes. enable the firewall on one of the data nodes, clear the ndb data folder and start it up again with "ndbd --initial" it will sync all data but never complete the connection.

Suggested fix:
I do not expect mysql to be able to fix this issue as it is definitely a system setting problem, but i do think that an error should be displayed if data is synchronized between data nodes and the data node fails to make the final steps to connecting.

But if their should be some sort of timeout to display an error on specific stages. This is specifically but not limited to being stuck on phase 100 which is deprecated.

This should at the very least print some sort of message stating that it is unable to make the final connection and that the user should check their network/firewall settings, instead of staying stuck on phase 100 for every.
[4 Jun 2008 19:15] Jon Stephens
See http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-security-networking-issues.html, where it says:

      Because MySQL Cluster requires large numbers of ports to be open
      for communications between nodes, the recommended option is to use
      a segregated network. This represents the simplest way to prevent
      unwanted traffic from reaching the cluster.

See also http://dev.mysql.com/doc/refman/5.1/en/faqs-mysql-cluster.html#qandaitem-32-10-12  

The discussion in these and other places makes it pretty clear (IMO) that all data and management nodes need to be in the same subnet and there shouldn't be any firewalls between them.

As for lack of an error message being given in this type of situation, it's not really up to me to say whether this is a bug or not.
[30 Oct 2009 14:00] Jon Stephens
Based on earlier comments and discussion today with Hartmut and Pekka, I've closed this as !BUG.