MySQL Bugs: #78992: mysqlrpladmin switchover fails to restore master settings on error

Bug #78992	mysqlrpladmin switchover fails to restore master settings on error
Submitted:	28 Oct 2015 2:20	Modified:	10 Feb 2017 5:38
Reporter:	monty solomon	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Utilities	Severity:	S2 (Serious)
Version:	1.5.6	OS:	CentOS
Assigned to:		CPU Architecture:	Any

Description:
Using mysqlrpladmin switchover the slave did not catch up to the master and the master was left in read_only mode.

How to repeat:
% mysqlrpladmin switchover --verbose --discover-slaves-login=mysqlrpladmin.cnf[white-fog] --master=mysqlrpladmin.cnf[white-fog] --new-master=mysqlrpladmin.cnf[calm-shape] --demote-master --rpl-user=SUSR_Repl:redacted --log=mysqlrpladmin.log --timeout=5 

# Blocking writes on master.
# LOCK STRING: FLUSH TABLES WITH READ LOCK
# Waiting for slaves to catch up to old master.
# Slave lucky-hola:3306:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-18136', 5)
# Return Code = -1
Slave lucky-hola:3306 did not catch up to the master.
ERROR: Slave lucky-hola:3306 did not catch up to the master.

mysql> select @@hostname, @@read_only;
+------------+-------------+
| @@hostname | @@read_only |
+------------+-------------+
| white-fog  |           0 |
+------------+-------------+
1 row in set (0.00 sec)

Suggested fix:
Restore the master settings during any exception handling

            if not res:
                msg = "Slave %s:%s did not catch up to the master." % \
                      (slave_dict['host'], slave_dict['port'])
                if not self.force:
                    self._report(msg, logging.CRITICAL)
                    raise UtilRplError(msg)
                else:
                    self._report("# %s" % msg)

Hello monty solomon,

Thank you for the bug report.
I tried with latest version of MySQL Utilities 1.6.4 version as well and it worked without any issues.
Could you please provide repeatable test case (how the master and slave(s) were setup, config file/s, etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.

Were you able to reproduce the case where the slave was hung waiting for the master to catch up.

The slave was waiting to execute a gtid even though it already executed it (one or more earlier gtids were missing).

Hello monty solomon,

Thank you for your feedback.
Verified based on internal discussion with dev's.

Thanks,
Chiranjeevi.