Bug #78992 mysqlrpladmin switchover fails to restore master settings on error
Submitted: 28 Oct 2015 2:20 Modified: 10 Feb 2017 5:38
Reporter: monty solomon Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Utilities Severity:S2 (Serious)
Version:1.5.6 OS:CentOS
Assigned to: CPU Architecture:Any

[28 Oct 2015 2:20] monty solomon
Description:
Using mysqlrpladmin switchover the slave did not catch up to the master and the master was left in read_only mode.

How to repeat:
% mysqlrpladmin switchover --verbose --discover-slaves-login=mysqlrpladmin.cnf[white-fog] --master=mysqlrpladmin.cnf[white-fog] --new-master=mysqlrpladmin.cnf[calm-shape] --demote-master --rpl-user=SUSR_Repl:redacted --log=mysqlrpladmin.log --timeout=5 

# Blocking writes on master.
# LOCK STRING: FLUSH TABLES WITH READ LOCK
# Waiting for slaves to catch up to old master.
# Slave lucky-hola:3306:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('17f11668-4ba8-11e5-bade-0ec3c09abde1:1-18136', 5)
# Return Code = -1
Slave lucky-hola:3306 did not catch up to the master.
ERROR: Slave lucky-hola:3306 did not catch up to the master.

mysql> select @@hostname, @@read_only;
+------------+-------------+
| @@hostname | @@read_only |
+------------+-------------+
| white-fog  |           0 |
+------------+-------------+
1 row in set (0.00 sec)

Suggested fix:
Restore the master settings during any exception handling

            if not res:
                msg = "Slave %s:%s did not catch up to the master." % \
                      (slave_dict['host'], slave_dict['port'])
                if not self.force:
                    self._report(msg, logging.CRITICAL)
                    raise UtilRplError(msg)
                else:
                    self._report("# %s" % msg)
[21 Oct 2016 7:01] Chiranjeevi Battula
Hello monty solomon,

Thank you for the bug report.
I tried with latest version of MySQL Utilities 1.6.4 version as well and it worked without any issues.
Could you please provide repeatable test case (how the master and slave(s) were setup, config file/s, etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.
[3 Nov 2016 4:28] monty solomon
Were you able to reproduce the case where the slave was hung waiting for the master to catch up.

The slave was waiting to execute a gtid even though it already executed it (one or more earlier gtids were missing).
[10 Feb 2017 5:38] Chiranjeevi Battula
Hello monty solomon,

Thank you for your feedback.
Verified based on internal discussion with dev's.

Thanks,
Chiranjeevi.