Bug #118303 mysql router wrongly quarantines instances on DNS failure
Submitted: 29 May 17:48 Modified: 17 Jun 7:11
Reporter: Paulo Machado Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Router Severity:S1 (Critical)
Version:8.0.41+ OS:Linux
Assigned to: MySQL Verification Team CPU Architecture:Any

[29 May 17:48] Paulo Machado
Description:
MySQL router seems to quarantine instances upon DNS failures in Group Replication.

If router fails to contact a primary instance due failures in address resolution, it quarantines the instance and expect a cluster topology change before de-quarantining it.

It's possible to mitigate it by using "destination_status" configuration knobs, but that comes with it's own drawbacks.

Otherwise, it's necessary to restart the router

How to repeat:
If one has access to DNS service, it's possible to trigger the issue by sending a SIGSTOP/SIGCONT to it to simulate a DNS timeout, e.g., using coredns as example:

$ pkill -19 -f coredns; sleep 20; pkill -18 -f coredns
[17 Jun 7:11] MySQL Verification Team
Hi,

I believe this is the same underlying cause as for Bug #118059