Description:
start mysqlfailover using daemon.
mysqlfailover --master=fo --discover-slaves-login=fo --log=/tmp/fo.log --rpl-user=repl:repl --force --exec-after=/root/wbw/fo_after.sh --daemon=start
After a while stop it using:
mysqlfailover --master=fo --discover-slaves-login=fo --log=/tmp/fo.log --rpl-user=repl:repl --force --exec-after=/root/wbw/fo_after.sh --daemon=stop
the sql_thread of the slave shows:
...
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1146
Last_Error: Error executing row event: 'Table 'mysql.failover_console' doesn't exist'
Skip_Counter: 0
...
...
Last_SQL_Errno: 1146
Last_SQL_Error: Error executing row event: 'Table 'mysql.failover_console' doesn't exist'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 144
I checked the relay log and find a delete operation :
BINLOG '
7eg+WxOQAAAAQQAAAIgBAAAAAGwAAAAAAAEABW15c3FsABBmYWlsb3Zlcl9jb25zb2xlAAL+/gTe
/f4eA8UEJho=
7eg+WyCQAAAAOgAAAMIBAAAAAGwAAAAAAAEAAgAC//wPADE5Mi4xNjguMjE2LjE0NAQzMzA2EyTN
fg==
'/*!*/;
### DELETE FROM `mysql`.`failover_console`
### WHERE
### @1='192.168.216.144' /* STRING(765) meta=57085 nullable=1 is_null=0 */
### @2='3306' /* STRING(30) meta=65054 nullable=1 is_null=0 */
# at 617
#180706 11:58:37 server id 144 end_log_pos 481 CRC32 0xc4c3a0bf Xid = 722
COMMIT/*!*/;
However failover_console does no longer existed on the slave!
The mysqlfailover log shows:
2018-07-06 15:11:03 PM INFO Server '192.168.216.145:3306' is using MySQL version 5.7.22-log.
2018-07-06 15:11:03 PM INFO Discovering slave at 192.168.216.147:3306
2018-07-06 15:11:03 PM INFO Found slave: 192.168.216.147:3306
2018-07-06 15:11:03 PM INFO Server '192.168.216.147:3306' is using MySQL version 5.7.22-log.
2018-07-06 15:11:03 PM INFO Unregistering instance on master.
2018-07-06 15:11:03 PM INFO Master entry in the failover_console table was deleted.
I also checked the source code in failover_daemon.py:
def cleanup(self):
"""Controlled cleanup for the daemon.
It will unregister the failover_console table.
"""
# if master is not alive, try connecting to it
if not self.master.is_alive():
self.master.connect()
try:
self.master.exec_query(_DELETE_FC_TABLE.format(self.master.host,
self.master.port))
self._report("Master entry in the failover_console"
" table was deleted.", logging.INFO, False)
except:
pass
The cleanup function didn't turn off binary log before executing deleting operation.
I think that is the problem.
How to repeat:
Do as description.
Suggested fix:
Turn off binary log before executing the deleting statement in cleanup function.
Actually, in the unregister_slaves function, the FC table is dropped on slaves.cleaning up master should also use 'drop if exist' statement instead of deleting.