Bug #87043 Cannot start MySQL Cluster under Docker Swarm
Submitted: 12 Jul 2017 15:28 Modified: 1 May 2018 13:01
Reporter: Sean McDowell Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Package Repos and Docker Images Severity:S2 (Serious)
Version:7.5 OS:Linux
Assigned to: Lars Tangvald CPU Architecture:Any

[12 Jul 2017 15:28] Sean McDowell
Description:
Unable to start a MySQL cluster under Docker Swarm.

The root of the problem is in the MgmtSrvr.cpp file in the ::find_node_type method.

Here it verifies that the client IP matches the IP resolved from the host name in the /etc/mysql-cluster.cnf file.

In Docker Swarm these will never be the same. The hostname of the service is a virtual IP that will load balance to a service replica.

For instance, the hostname 'mysql_data1' might resolve to 10.0.0.4 whereas the IP of the container might be 10.0.0.5. It is not possible to know the IP of the container beforehand. Swarm does not permit assignment of static IPs to service replicas.

How to repeat:
Install Docker 17.06 on a Linux system -- follow the steps at https://get.docker.com.

Pull the mysql-cluster image:

	docker pull mysql/mysql-cluster

Enable Swarm Mode for this node. Normally we would use a swarm with three managers and distribute the MySQL services across the nodes using placement preferences but for this defect we don't bother with that.

	docker swarm init
	
Run the create_mysql_secrets.sh script to make configuration available to swarm nodes:

	./create_mysql_secrets.sh

Deploy the stack -- this will start a management node and two data nodes.

	docker stack deploy -c mysql_min.yml mysql
	
Check the status of the services:

	docker service ls
	
When all services have replicas check the logs of a data node:

	docker service logs mysql_mysql_data1
	
You will see errors like:

-- Failed to allocate nodeid, error: 'Error: Could not alloc node id at mysql_mgmt port 1186: Connection done from wrong host ip 10.0.0.5.'

The hostname mysql_data1 resolves to 10.0.0.4 in this case (it is a virtual IP address). The container has the IP address 10.0.0.5. So the client IP will never be the same address as what the hostname resolves to.

Another issue is if you create SQL Server nodes. I see errors in the Management logs complaining that it cannot allocate a node id for them -- not sure why.

Suggested fix:
I am not sure what the underlying design rationale is for checking the client IP versus the hostname.
Currently you are required to provide a hostname for the ndbd nodes. Maybe that should be optional?
[12 Jul 2017 15:29] Sean McDowell
Creates config entries

Attachment: create_mysql_secrets.sh (application/octet-stream, text), 591 bytes.

[12 Jul 2017 15:29] Sean McDowell
my.cnf file for data node & sql server

Attachment: my.cnf (application/octet-stream, text), 471 bytes.

[12 Jul 2017 15:29] Sean McDowell
config file for management server

Attachment: mysql-cluster.cnf (application/octet-stream, text), 1.06 KiB.

[12 Jul 2017 15:29] Sean McDowell
Minimal stack file to demonstrate the issue

Attachment: mysql_min.yml (application/octet-stream, text), 1.33 KiB.

[12 Jul 2017 21:35] Mikael Ronström
The reason why MySQL Servers cannot allocate a node id is because you need to have at least
one data node up and running before you can allocate a node id. Since the start of data node
fails it isn't possible for the MySQL Server to allocate a node id.
[12 Jul 2017 21:42] Mikael Ronström
The reason why hostname is needed for NDB data nodes is that they need to connect to each other.
So if you have two NDB data nodes, one on mysql_data1 and one on mysql_data2, then they will
try to setup a connection towards the others hostname. So therefore the NDB management server
verifies that they actually use the correct client IP address before proceeding.
[13 Jul 2017 12:51] Sean McDowell
Is the problem that the Management Server is trying to map the inbound data node to the slot in its configuration based on its IP address?

If so then perhaps there needs to be another way for a node to identify itself rather than just by IP address (by a configured ID for instance).
[17 Jul 2017 15:02] Sean McDowell
Good news, I did some further experimentation and got this to work!
The key is to change the service discovery mode for the services to dnsrr (from the default of using Virtual  IP).

Use this sort of entry in the stack file:

    deploy:
      endpoint_mode: dnsrr

I did find I need to explicitly specify the path to the configuration file for ndb_mgmd for some reason:

    command: ndb_mgmd --config-cache=FALSE -f /etc/mysql-cluster.cnf

I thougth this is the correct path by default at which it finds the configuration file (according to Docker Hub) but it wasn't working for me.

Given this, I think we can close this issue (updating instructions on hub would be great!)
[25 Jul 2017 9:31] Lars Tangvald
Thanks for looking into this!
Assigning this to myself, so we can add some documentation for it before closing.

The entrypoint script specifies the path to /etc/mysql-cluster.cnf, so odd that you would need to add it manually.

Note that we've uploaded new Docker images for cluster. The new images have the same functionality as the server images along with the basic cluster functionality.
The update should also resolve https://bugs.mysql.com/bug.php?id=86854
[1 May 2018 7:56] Jascha Brinkmann
I am struggeling to get this to work.

I get an error `Unable to lookup/illegal hostname mysql_data1`

Do you have to add the `endpoint_mode: dnsrr` directive to each service?

Even if I add it to each service it still doesn't work.

If you deploy the services from a stack file docker automatically adds the stack name in front of the service name. But even having that in mind I still get the same error `Unable to lookup/illegal hostname`.

Would be incredible helpful to see your final compose/stack file.

Thanks
[1 May 2018 13:01] Sean McDowell
I have found that the hostname may not be resolvable immediately when the Docker container starts up. This can cause MySQL containers to endlessly start up / shut down when they cannot resolve hostnames.

I have a bash script that first sleeps for 5 seconds before doing anything. This resolves the service hostname not being resolvable.

For the 'ndb_mgmd' container I wait until hostnames are resolvable before starting it:

if [ "$1" = 'ndb_mgmd' ]; then
	# Don't proceed until the hostnames are resolvable; otherwise, ndb_mgmd will quit
	HOSTNAMES=${HOSTNAMES:-mysql-data1 mysql-data2 mysql-mgmt}
	HOSTNAMES_ARR=(${HOSTNAMES})
	
	for HOSTNAME in "${HOSTNAMES_ARR[@]}"; do
		echo "Waiting for ${HOSTNAME}"
		while ! getent hosts "${HOSTNAME}"; do
			sleep 1
		done
	done
fi