Bug #73087 slave hits error 1595 when using low master_heartbeat_period
Submitted: 24 Jun 2014 2:04 Modified: 26 Jun 2014 19:25
Reporter: Santosh Praneeth Banda Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6.19 5.6.12 OS:Any
Assigned to: CPU Architecture:Any
Tags: replication

[24 Jun 2014 2:04] Santosh Praneeth Banda
Description:
slave hits error 1595 when using low master_heartbeat_period.

slave io_thread resets Format Description Event (FDE) to a lower version on receiving a Rotate Event
from master. With low master_heartbeat_period, dump thread may send heartbeat event after sending
Rotate event but before sending FDE. Io_thread cannot understand heartbeat event with old version of
FDE causing error 1595.

Buggy code in sql/rpl_slave.cc

 /*
 If mi_description_event is format <4, there is conversion in the
 relay log to the slave's format (4). And Rotate can mean upgrade or
 nothing. If upgrade, it's to 5.0 or newer, so we will get a Format_desc, so
 no need to reset mi_description_event now. And if it's nothing (same
 master version as before), no need (still using the slave's format).
 */
 Format_description_log_event *old_fdle= mi->get_mi_description_event();
 if (old_fdle->binlog_version >= 4)
 {
 DBUG_ASSERT(old_fdle->checksum_alg ==
 mi->rli->relay_log.relay_log_checksum_alg);
 Format_description_log_event *new_fdle= new
 Format_description_log_event(3);
 new_fdle->checksum_alg= mi->rli->relay_log.relay_log_checksum_alg;
 mi->set_mi_description_event(new_fdle);
 }

restarting io_thread fixes the problem.

How to repeat:
on slave
change master to master_heartbeat_period=0.25

slave hits error 1595 on master's binary log rotate.

Suggested fix:
I don't get the reason for setting mi_description_event to a lower version in the first place. It should not be done to avoid this bug. The comment is the code is not clear too on why resetting mi_description_event is done.
[26 Jun 2014 7:33] MySQL Verification Team
Hello Santosh,

Thank you for the report.
I couldn't reproduce this issue with simple replication(5.6.19->5.6.19, also later on included MTS), with moderate load(with as low as 4k max binlog size to trigger frequent log rotate etc) and with low master_heartbeat_period(0.001.. 0.25) etc. Could you please provide repeatable test case and related details?

Thanks,
Umesh
[26 Jun 2014 19:25] Santosh Praneeth Banda
Hello Umesh,

Sorry for the false bug report. I forget to remove a back port patch from 5.7.2 in my local branch which triggered the bug.

Thanks,

Santosh
[27 Jun 2014 4:17] MySQL Verification Team
Thanks you for confirming this.

Thanks,
Umesh