Bug #70391 uninstall and install semi-sync plugin causes slaves to break
Submitted: 20 Sep 2013 21:21 Modified: 6 May 2014 14:37
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.13, 5.6.14 OS:Any
Assigned to: CPU Architecture:Any

[20 Sep 2013 21:21] Santosh Praneeth Banda
Description:
see how to repeat

How to repeat:
1) UNINSTALL PLUGIN rpl_semi_sync_slave;
2) INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_PLUGIN'; 
3) slave fails with following error

Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: t
he master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. I
f you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
  

Suggested fix:
Block uninstalling semi sync slave plugin while slave is running.
[30 Sep 2013 9:49] MySQL Verification Team
Hello Santosh,

Thank you for the bug report.
I tried to reproduce with the provided steps but couldn't repeat the reported behavior, also tried with a heavy to moderate load on master to see if that repeats this but with no luck.

Could you please provide configuration files from all servers? and details which would help us to reproduce this issue?

Also, could you confirm is this issue repeatable on latest GA i.e on 5.6.14? 

Thanks,
Umesh
[21 Oct 2013 19:17] Santosh Praneeth Banda
Sorry, i should have been more clearer in my repro steps. I think you tried uninstalling slave plugin on master, but should be done on a running semi-sync slave.

Here is a mtr test that reproduces consistently

== mysql-test/suite/rpl/t/rpl_semi_sync_uninstall_plugin.test ==
source include/have_semisync_plugin.inc;
source include/not_embedded.inc;
source include/master-slave.inc;

connection master;
eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_PLUGIN';
set global rpl_semi_sync_master_timeout= 6000000;

connection slave;
eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_PLUGIN';
set global rpl_semi_sync_slave_enabled = ON;
source include/stop_slave.inc;
source include/start_slave.inc;

connection master;
set global rpl_semi_sync_master_enabled = ON;
create table t1 (a int);
insert into t1 values(1);

connection slave;
UNINSTALL PLUGIN rpl_semi_sync_slave;

connection master;
insert into t1 values(2);

drop table t1;
UNINSTALL PLUGIN rpl_semi_sync_master;
source include/rpl_end.inc;

== mysql-test/suite/rpl/t/rpl_semi_sync_uninstall_plugin-master.opt ==
$SEMISYNC_PLUGIN_OPT

== mysql-test/suite/rpl/t/rpl_semi_sync_uninstall_plugin-slave.opt ==
$SEMISYNC_PLUGIN_OPT
[22 Oct 2013 7:49] MySQL Verification Team
Thank you for the feedback and test case.
I'm able to reproduce the issue.

Thanks,
Umesh
[6 May 2014 14:37] Paul DuBois
Noted in 5.5.39, 5.6.20, 5.7.5 changelogs.

Uninstalling and reinstalling semisynchronous replication plugins
while semisynchronous replication was active caused replication
failures. The plugins now check whether they can be uninstalled and
produce an error if semisynchronous replication is active. To
uninstall the the master-side plugin, there must be no 
semisynchronous slaves. To uninstall the slave-side plugin, there
must be no semisynchronous I/O threads running.
[1 Aug 2014 15:51] Laurynas Biveinis
5.5 $ bzr log -r 4631
------------------------------------------------------------
revno: 4631
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.5
timestamp: Mon 2014-05-05 22:22:15 +0530
message:
  Bug#17638477 UNINSTALL AND INSTALL SEMI-SYNC PLUGIN CAUSES SLAVES TO BREAK
  
  Problem: Uninstallation of semi sync plugin causes replication to
  break.
  
  Analysis: A semisync enabled replication is mutual agreement between
  Master and Slave when the connection (I/O thread) is established.
  Once I/O thread is started and if semisync is enabled on both
  master and slave, master appends special magic header to events
  using semisync plugin functions and sends it to slave. And slave
  expects that each event will have that special magic header format
  and reads those bytes using semisync plugin functions.
  
  When semi sync replication is in use if users execute
  uninstallation of the plugin on master, slave gets confused while
  interpreting that event's content because it expects special 
  magic header at the beginning of the event. Slave SQL thread will
  be stopped with "Missing magic number in the header" error.
  
  Similar problem will happen if uninstallation of the plugin happens
  on slave when semi sync replication is in in use. Master sends
  the events with magic header and slave does not know about the
  added magic header and thinks that it received a corrupted event.
  Hence slave SQL thread stops with "Found  corrupted event" error.
  
  Fix: Uninstallation of semisync plugin will be blocked when semisync
  replication is in use and will throw 'ER_UNKNOWN_ERROR' error.
  To detect that semisync replication is in use, this patch uses
  semisync status variable values.
   > On Master, it checks for 'Rpl_semi_sync_master_status' to be OFF
      before allowing the uninstallation of rpl_semi_sync_master plugin.
      >> Rpl_semi_sync_master_status is OFF when
          >>> there is no dump thread running
          >>> there are no semisync slaves
   > On Slave, it checks for 'Rpl_semi_sync_slave_status' to be OFF
      before allowing the uninstallation of rpl_semi_sync_slave plugin.
      >> Rpl_semi_sync_slave_status is OFF when
         >>> there is no I/O thread running
         >>> replication is asynchronous replication.
[1 Aug 2014 15:52] Laurynas Biveinis
5.5 $ bzr log -r 4632
------------------------------------------------------------
revno: 4632
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.5
timestamp: Tue 2014-05-06 11:23:42 +0530
message:
  Bug#17638477 UNINSTALL AND INSTALL SEMI-SYNC PLUGIN CAUSES SLAVES TO BREAK
  
  Fixing post push failure
[1 Aug 2014 15:53] Laurynas Biveinis
5.5 laurynas$ bzr log -r 4634
------------------------------------------------------------
revno: 4634
committer: Venkatesh Duggirala<venkatesh.duggirala@oracle.com>
branch nick: mysql-5.5
timestamp: Wed 2014-05-07 14:33:58 +0530
message:
  Bug#17638477 UNINSTALL AND INSTALL SEMI-SYNC PLUGIN CAUSES SLAVES TO BREAK
  
  Fixing post push failure