Bug #111589 | MySQL Operator - split brain on all data nodes | ||
---|---|---|---|
Submitted: | 27 Jun 2023 18:37 | Modified: | 21 Jul 2023 16:31 |
Reporter: | Carlos Abrantes | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Operator | Severity: | S3 (Non-critical) |
Version: | OS: | Any | |
Assigned to: | CPU Architecture: | Any |
[27 Jun 2023 18:37]
Carlos Abrantes
[30 Jun 2023 14:29]
MySQL Verification Team
Hi, I'm not able to reproduce any inconsistencies here? How did you exacly reconfigure firewall/networking to make this issue. Few more details about how to reproduce might be helpful to reproduce this. So far everything behaves always the same, maybe not ideal but is consistent. thanks
[12 Jul 2023 10:37]
Carlos Abrantes
Hi, Sorry for such late reply, is it suppose to receive a notification when the ticket is updated? i didn't got. Can you describe which of the behaviours you are getting? From the logs i sent its possible to understand that in first it forces the quorum in one node and in the others, well there is the absence of any logs so, i can' send it. i m running on k8s with cilium and i m doing it with network policies. Something like: apiVersion: "cilium.io/v2" kind: CiliumClusterwideNetworkPolicy metadata: name: "pod-a" spec: endpointSelector: matchLabels: statefulset.kubernetes.io/pod-name: mysql-0 ingress: - fromEndpoints: - matchExpressions: - key: statefulset.kubernetes.io/pod-name operator: NotIn values: - mysql-1 - mysql-2 egress: - toEndpoints: - matchExpressions: - key: statefulset.kubernetes.io/pod-name operator: NotIn values: - mysql-1 - mysql-2 i have 3 of those rules (with changes to the target pod and src/dst pods) one applied to each pod, preventing communication to the other pod. At the end the result is that MySQL data nodes can't communicate with each other, but can communicate with the operator. So in the first time Operator was clever enough to force quorum to 1 node allowing service to be available (then it wasn't able to recover alone from it, which was the second problem) and other times just lost service. Thanks, Carlos
[21 Jul 2023 16:31]
Carlos Abrantes
Hi, Can you please confirm if you were able to reproduce this problem? What was expected to happen in this case, as we saw logs where operator forced quorum and also didn't logged anything? Thanks, Carlos
[21 Jul 2023 16:31]
Carlos Abrantes
Hi, Can you please confirm if you were able to reproduce this problem? What was expected to happen in this case, as we saw logs where operator forced quorum and also didn't logged anything? Thanks, Carlos