Description:
Although mysqlrpladmin is supposed to check errant transactions if --force is not used, it doesn't do so, making it dangerous to use for any GTID-based failover/switchover.
How to repeat:
1) Set up a master and 2 slaves, using GTID replication.
Here is my config:
$ mysqlrpladmin --master=root@localhost:13001 --discover-slaves-login=root --quiet health
# Discovering slave at localhost:13002
# Found slave: localhost:13002
# Discovering slave at localhost:13003
# Found slave: localhost:13003
+------------+--------+---------+--------+------------+---------+
| host | port | role | state | gtid_mode | health |
+------------+--------+---------+--------+------------+---------+
| localhost | 13001 | MASTER | UP | ON | OK |
| localhost | 13002 | SLAVE | UP | ON | OK |
| localhost | 13003 | SLAVE | UP | ON | OK |
+------------+--------+---------+--------+------------+---------+
2) Create an errant transaction on localhost:13003 and remove the corresponding entry in the binlogs:
mysql> CREATE DATABASE test2;
mysql> flush logs;
Query OK, 0 rows affected (0,02 sec)
mysql> show binary logs;
+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| mysql-bin.000001 | 69309 |
| mysql-bin.000002 | 1237667 |
| mysql-bin.000003 | 617 |
| mysql-bin.000004 | 231 |
+------------------+-----------+
sb3> purge binary logs to 'mysql-bin.000004';
3) Switch over to localhost:13003 using mysqlrpladmin:
$ mysqlrpladmin --master=root@127.0.0.1:13001 --new-master=root@127.0.0.1:13003 --discover-slaves-login=root --demote-master --rpl-user=repl:repl --verbose switchover
# Discovering slaves for master at 127.0.0.1:13001
# Discovering slave at localhost:13002
# Found slave: localhost:13002
# Discovering slave at localhost:13003
# Found slave: localhost:13003
# Checking privileges.
# WARNING: You may be mixing host names and IP addresses. This may result in negative status reporting if your DNS services do not support reverse name lookup.
# Performing switchover from master at 127.0.0.1:13001 to slave at 127.0.0.1:13003.
# Checking candidate slave prerequisites.
# GTID_MODE=ON is set for all servers.
# Checking eligibility of slave 127.0.0.1:13003 for candidate.
# Slave connected to master ... Ok
# GTID_MODE=ON ... Ok
# Logging filters agree ... Ok
# Replication user exists ... Ok
# Checking slaves configuration to master.
# When the master_info_repository variable is set to FILE, the --rpl-user option may be used only if the user specified matches what is shown in the SLAVE STATUS output unless the --force option is used.
# Creating replication user if it does not exist.
# Blocking writes on master.
# LOCK STRING: FLUSH TABLES WITH READ LOCK
# Waiting for slaves to catch up to old master.
# Slave localhost:13002:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('53c30a4c-fc4b-11e3-b006-9cebe8067a3f:1-2', 300)
# Return Code = 0
# Slave localhost:13003:
# QUERY = SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('53c30a4c-fc4b-11e3-b006-9cebe8067a3f:1-2', 300)
# Return Code = 0
# Stopping slaves.
# Performing STOP on all slaves.
# Executing stop on slave localhost:13002 Ok
# Executing stop on slave localhost:13003 Ok
# UNLOCK STRING: UNLOCK TABLES
# Demoting old master to be a slave to the new master.
# Switching slaves to new master.
# Executing CHANGE MASTER on localhost:13002.
# CHANGE MASTER TO MASTER_HOST = '127.0.0.1', MASTER_USER = 'repl', MASTER_PASSWORD = 'repl', MASTER_PORT = 13003, MASTER_AUTO_POSITION=1
# Executing CHANGE MASTER on 127.0.0.1:13001.
# CHANGE MASTER TO MASTER_HOST = '127.0.0.1', MASTER_USER = 'repl', MASTER_PASSWORD = 'repl', MASTER_PORT = 13003, MASTER_AUTO_POSITION=1
# Starting all slaves.
# Performing START on all slaves.
# Executing start on slave localhost:13002 Ok
# Executing start on slave 127.0.0.1:13001 Ok
# Checking slaves for errors.
# localhost:13002 status: Ok
# 127.0.0.1:13001 status: Ok
# Switchover complete.
# Attempting to contact 127.0.0.1 ... Success
# Attempting to contact localhost ... Success
# Attempting to contact 127.0.0.1 ... Success
#
# Replication Topology Health:
+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+---------------+
| host | port | role | state | gtid_mode | health | version | master_log_file | master_log_pos | IO_Thread | SQL_Thread | Secs_Behind | Remaining_Delay | IO_Error_Num | IO_Error | SQL_Error_Num | SQL_Error | Trans_Behind |
+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+---------------+
| 127.0.0.1 | 13003 | MASTER | UP | ON | OK | 5.6.19-log | mysql-bin.000004 | 570 | | | | | | | | | |
| 127.0.0.1 | 13001 | SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master. | 5.6.19-log | | 4 | No | Yes | 0 | No | 1236 | Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' | 0 | | 1 |
| localhost | 13002 | SLAVE | UP | ON | IO thread is not running., Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Slave has 1 transactions behind master. | 5.6.19-log | | 4 | No | Yes | 0 | No | 1236 | Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' | 0 | | 1 |
+------------+--------+---------+--------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------------+-----------------+------------+-------------+--------------+------------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+---------------+
# ...done.
Now replication is broken