Bug #92547 Contribution: Make MYSQL_INNODB_NUM_MEMBERS work with offline members
Submitted: 24 Sep 2018 17:05 Modified: 25 Sep 2018 4:34
Reporter: OCA Admin (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Package Repos and Docker Images Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: CPU Architecture:Any
Tags: Contribution

[24 Sep 2018 17:05] OCA Admin
Description:
This bug tracks a contribution by Gianluca Borello (Github user: gianlucaborello) as described in http://github.com/mysql/mysql-docker/pull/8

How to repeat:
See description

Suggested fix:
See contribution code attached
[24 Sep 2018 17:05] OCA Admin
Contribution submitted via Github - Make MYSQL_INNODB_NUM_MEMBERS work with offline members 
(*) Contribution by Gianluca Borello (Github gianlucaborello, mysql-docker/pull/8#issuecomment-423215012): Thank you for your help. Yes, my name is already listed in the OCA list and have successfully contributed in other Oracle projects on GitHub using this username. As far as the other request, here is the explicit agreement of the OCA:

I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Let me know if I should also send it via email.

Thanks

Contribution: git_patch_216382185.txt (text/plain), 1.74 KiB.

[25 Sep 2018 3:14] Patrick Galbraith
Hi there! 

I've had the same problem, within the context of Kubernetes and using the MySQL Operator and the wordpress-router demo in the operator source. 

1. Create the database cluster, default with 3 nodes `kubectl create -f wordpress-database.yaml`
2. Create the wordpress deployment (includes mysql-router + wordpress app containers) `kubectl create -f wordpress-deployment.yaml`
3. Scale the cluster from 3 to 7:

`kubectl edit cluster.mysql.oracle.com mysql-wordpress`

Change `members: 3` to `members: 7`

4. Kill the wordpress-router pod

`kubectl delete po wordpress-router-xxxxxxxx-nnnn`

5. Observe that it won't restart:

```wordpress-router-695dbcd6d-5bmsx   0/2       ImagePullBackOff   26         2h
[opc@bastion-ad1 wordpress-router]$ kubectl describe po wordpress-router-695dbcd6d-5bmsx

<snip>

Events:
  Type     Reason   Age                 From                    Message
  ----     ------   ----                ----                    -------
  Warning  Failed   15m (x508 over 2h)  kubelet, 129.213.45.45  Error: ImagePullBackOff
  Warning  BackOff  5m (x482 over 2h)   kubelet, 129.213.45.45  Back-off restarting failed container
  Normal   BackOff  52s (x578 over 2h)  kubelet, 129.213.45.45  Back-off pulling image "capttofu/mysql-router"
[opc@bastion-ad1 wordpress-router]$ kubectl describe po wordpress-router-695dbcd6d-5bmsx 

```

This needs to evaluate at true:

```
opc@bastion-ad1 wordpress-router]$ kubectl exec -it mysql-wordpress-0 -c mysql -- mysql -u root -pmy-super-secret-pass mysql_innodb_cluster_metadata -e 'select count(*) = 3 FROM instances WHERE replicaset_id = (SELECT replicaset_id FROM instances WHERE mysql_server_uuid = @@server_uuid);'
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------+
| count(*) = 3 |
+--------------+
|            0 |
+--------------+
```

This patch looks promising, though there will be something else we look at for determining if the cluster is as it should be. I'm giving this a lot of thought.
[25 Sep 2018 3:14] Patrick Galbraith
Hi there! 

I've had the same problem, within the context of Kubernetes and using the MySQL Operator and the wordpress-router demo in the operator source. 

1. Create the database cluster, default with 3 nodes `kubectl create -f wordpress-database.yaml`
2. Create the wordpress deployment (includes mysql-router + wordpress app containers) `kubectl create -f wordpress-deployment.yaml`
3. Scale the cluster from 3 to 7:

`kubectl edit cluster.mysql.oracle.com mysql-wordpress`

Change `members: 3` to `members: 7`

4. Kill the wordpress-router pod

`kubectl delete po wordpress-router-xxxxxxxx-nnnn`

5. Observe that it won't restart:

```wordpress-router-695dbcd6d-5bmsx   0/2       ImagePullBackOff   26         2h
[opc@bastion-ad1 wordpress-router]$ kubectl describe po wordpress-router-695dbcd6d-5bmsx

<snip>

Events:
  Type     Reason   Age                 From                    Message
  ----     ------   ----                ----                    -------
  Warning  Failed   15m (x508 over 2h)  kubelet, 129.213.45.45  Error: ImagePullBackOff
  Warning  BackOff  5m (x482 over 2h)   kubelet, 129.213.45.45  Back-off restarting failed container
  Normal   BackOff  52s (x578 over 2h)  kubelet, 129.213.45.45  Back-off pulling image "capttofu/mysql-router"
[opc@bastion-ad1 wordpress-router]$ kubectl describe po wordpress-router-695dbcd6d-5bmsx 

```

This needs to evaluate at true:

```
opc@bastion-ad1 wordpress-router]$ kubectl exec -it mysql-wordpress-0 -c mysql -- mysql -u root -pmy-super-secret-pass mysql_innodb_cluster_metadata -e 'select count(*) = 3 FROM instances WHERE replicaset_id = (SELECT replicaset_id FROM instances WHERE mysql_server_uuid = @@server_uuid);'
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------+
| count(*) = 3 |
+--------------+
|            0 |
+--------------+
```

This patch looks promising, though there will be something else we look at for determining if the cluster is as it should be. I'm giving this a lot of thought.
[25 Sep 2018 3:15] Patrick Galbraith
sorry for the accidental double-submit!
[25 Sep 2018 3:41] Gianluca Borello
Thanks for your comment, but keep in mind that the problem you're trying to solve seems different to me, in a subtle but fundamental way: you want to make sure that Router is resilient to changing at runtime the number of members in the cluster, whereas I just need Router to be resilient to members of the clusters temporarily going offline. In my case the cluster members never goes down.

Your condition introduces further complications because Router needs to have all the members listed in the configuration file in order for it to be able to properly adapt to traffic in case one of the bootstrap nodes goes down, otherwise you get complete failure even without Router restarting. I've done a deeper analysis of the issue here: https://github.com/mysql/mysql-docker/pull/8#issue-216382185

In particular, the relevant section is the one containing:

"""
However, it seems that all the cluster state is always discovered only via the bootstrap nodes. This means that if the original node discovered during the bootstrap goes down, the entire thing goes down, Router stops serving all the requests, even if we have a valid cluster formed by mysql2 and mysql3:
"""

It might make sense to separate the two things, the limitation you are facing seems more of an intrinsic one inside Router, mine seems more specific to the Docker image scripts.

Let me know what you think and if I missed something.

Thanks
[25 Sep 2018 4:34] Umesh Shastry
Hello Gianluca,

Thank you for the report and contribution.

regards,
Umesh