Bug #91562 using --daemon-stop to stop mysqlfailover cause slave sql_thread down
Submitted: 6 Jul 2018 7:51 Modified: 9 Jul 2018 10:28
Reporter: Bingwei Wang Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Utilities Severity:S1 (Critical)
Version:1.6.5 OS:CentOS (7)
Assigned to: CPU Architecture:Any
Tags: mysqlfailover

[6 Jul 2018 7:51] Bingwei Wang
Description:
start mysqlfailover using daemon.

mysqlfailover --master=fo --discover-slaves-login=fo --log=/tmp/fo.log  --rpl-user=repl:repl --force --exec-after=/root/wbw/fo_after.sh  --daemon=start

After a while stop it using:
mysqlfailover --master=fo --discover-slaves-login=fo --log=/tmp/fo.log  --rpl-user=repl:repl --force --exec-after=/root/wbw/fo_after.sh  --daemon=stop

the sql_thread of the slave shows:

...
          Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1146
                   Last_Error: Error executing row event: 'Table 'mysql.failover_console' doesn't exist'
                 Skip_Counter: 0
    ...
    ...
               Last_SQL_Errno: 1146
               Last_SQL_Error: Error executing row event: 'Table 'mysql.failover_console' doesn't exist'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 144

I checked the relay log and find a delete operation :
BINLOG '
7eg+WxOQAAAAQQAAAIgBAAAAAGwAAAAAAAEABW15c3FsABBmYWlsb3Zlcl9jb25zb2xlAAL+/gTe
/f4eA8UEJho=
7eg+WyCQAAAAOgAAAMIBAAAAAGwAAAAAAAEAAgAC//wPADE5Mi4xNjguMjE2LjE0NAQzMzA2EyTN
fg==
'/*!*/;
### DELETE FROM `mysql`.`failover_console`
### WHERE
###   @1='192.168.216.144' /* STRING(765) meta=57085 nullable=1 is_null=0 */
###   @2='3306' /* STRING(30) meta=65054 nullable=1 is_null=0 */
# at 617
#180706 11:58:37 server id 144  end_log_pos 481 CRC32 0xc4c3a0bf 	Xid = 722
COMMIT/*!*/;

However failover_console does no longer existed on the slave!

The mysqlfailover log shows:

2018-07-06 15:11:03 PM INFO Server '192.168.216.145:3306' is using MySQL version 5.7.22-log.
2018-07-06 15:11:03 PM INFO Discovering slave at 192.168.216.147:3306
2018-07-06 15:11:03 PM INFO Found slave: 192.168.216.147:3306
2018-07-06 15:11:03 PM INFO Server '192.168.216.147:3306' is using MySQL version 5.7.22-log.
2018-07-06 15:11:03 PM INFO Unregistering instance on master.
2018-07-06 15:11:03 PM INFO Master entry in the failover_console table was deleted.

I also checked the source code in failover_daemon.py:
def cleanup(self):
         """Controlled cleanup for the daemon.
 
         It will unregister the failover_console table.
         """
         # if master is not alive, try connecting to it
         if not self.master.is_alive():
             self.master.connect()
         try:
             self.master.exec_query(_DELETE_FC_TABLE.format(self.master.host,
                                                            self.master.port))
             self._report("Master entry in the failover_console"
                          " table was deleted.", logging.INFO, False)
         except:
             pass

The cleanup function didn't turn off binary log before executing deleting operation.
I think that is the problem.

How to repeat:
Do as description.

Suggested fix:
Turn off binary log before executing the deleting statement in cleanup function. 
Actually, in the unregister_slaves function, the FC table is dropped on slaves.cleaning up master should also use 'drop if exist' statement instead of deleting.
[9 Jul 2018 6:54] MySQL Verification Team
Hello Bingwei Wang,

Thank you for the report.
This is most likely duplicate of Bug #85254, please see Bug #85254

Thanks,
Umesh