Bug #82877 mysqlfailover daemon stops when unable to get health/gtid/uuid data
Submitted: 6 Sep 2016 21:32 Modified: 27 Dec 2016 17:35
Reporter: Owen Owen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Utilities Severity:S3 (Non-critical)
Version:1.6.4 OS:CentOS
Assigned to: CPU Architecture:Any

[6 Sep 2016 21:32] Owen Owen
Description:
Lines 619-634 in failover_daemon.py in mysqlfailover (MySQL Utilities) 1.6.4 get the data that should be logged via the --report-values parameter.  By default it is the health status.

the _format_health_data, _format_uuid_data, and _format_gtid_data methods are called.  In all of these methods, if there is an error connecting to a master of slave, an exception is thrown which eventually bubbles all the way up to rpl_admin.py in the auto_failover_as_daemon method which then causes mysqlfailover to stop.

If you are relying on mysqlfailover to perform failover, it is not ideal for the process to exit when it cannot get a connection to a database.

How to repeat:
Use a firewall rule to block access to the master from the server running mysqlfailover.  However, the rule must be introduced after failover_daemon.py has done it's check of the master but before the next round begins.

Suggested fix:
A possible solution could be instead of throwing exceptions in _format_health_data, _format_uuid_data, and _format_gtid_data - simply log the error and allow the next found of checking to detect a master is down and initiate a failover even if necessary.
[7 Sep 2016 12:50] Owen Owen
The same issues exists with the _log_master_status() method in failover_daemon.py.  If it is unable to connect to the master, it raises an exception and mysqlfailover exits.

Not being able to connect to the master should at a minimum cause mysqlfailover to check the master again.  Or initiate a failover.