Bug #95418 Large transactions cause "heartbeat is not compatible with local info" on slave
Submitted: 20 May 2019 4:48 Modified: 5 May 2021 15:39
Reporter: yunsheng zhu (OCA) Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.28, 5.7.18 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: heartbeat, Not Compatible

[20 May 2019 4:48] yunsheng zhu
Description:
We're TXSQL(TENCENT MySQL) team. Recently we are encounterd with an 
error "Slave I/O: Unexpected master's heartbeat data: heartbeat is 
not compatible with local info" caused by large transactions.
The reasons are as follows:
(1)Binlog bigger than 4G is produced by large transactions. 
(2)A heartbeat event is sent before this binlog is rotated. This will 
happen on 2 scenarios. First, when heartbeat period is set very little,
then a heartbeat timeout may happen before binlog is rotated. 
Second, when value of gitd_purged is set when start slave, a heartbeat event
is sent at the end of a continuous purged gtids on master to update master 
pos on slave.
(3)As pos in a heartbeat event is stored in the field of log pos of event
header, which is 4 bytes, so a pos larger than 4G will be truncated when 
sent to slave.
(4)After slave receives a truncated heartbeat event, it may will find  that
pos stored in heartbeat event is smaller than current master log pos, then 
a error will be reported and slave io thread will be stopped.

How to repeat:
Reference to our mtr test attached for detail.
Some things need to be explained:
(1)Our mtr test is based on 5.7.18, test on other versions is similar.
(2)To run test quickly, we use simulated truncated heartbeat. That is, 
we truncate heartbeat event when log pos is larger than 1M, not 4G.
(3)We add 2 debug syncs in function Binlog_sender::send_heartbeat_event.
  
  The first is at the beginning of this function. That is:
  #define SIMULATED_TRUNCATED_HEARTBEAT_EVENT_SIZE 1048576
  DBUG_EXECUTE_IF("ignore_truncated_heartbeat", {
    if (log_pos >= SIMULATED_TRUNCATED_HEARTBEAT_EVENT_SIZE) {
      const char act[] = "now signal heartbeat_ignored";
      DBUG_ASSERT(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
      DBUG_RETURN(0);
   }
  };);

  The second is after "int4store(header + LOG_POS_OFFSET, static_cast<uint32>(log_pos));".
  That is:
  DBUG_EXECUTE_IF("send_truncated_heartbeat", {
    if(log_pos >= SIMULATED_TRUNCATED_HEARTBEAT_EVENT_SIZE){
      int4store(header + LOG_POS_OFFSET, static_cast<uint32>(log_pos % SIMULATED_TRUNCATED_HEARTBEAT_EVENT_SIZE));
      const char act[] = "now signal heartbeat_truncated";
      DBUG_ASSERT(!debug_sync_set_action(current_thd, STRING_WITH_LEN(act)));
    }
  };);
[20 May 2019 4:49] yunsheng zhu
Mtr to repeat error "Slave I/O: Unexpected master's heartbeat data: heartbeat is  not compatible with local info"

Attachment: rpl_largetrans_unexpected_heartbeat.test (application/octet-stream, text), 10.11 KiB.

[20 May 2019 23:51] MySQL Verification Team
Hi,

Have you tested this with 5.7.26? I will be testing it today on 5.7.26 but just wanted to check if you already tried and if not, why not, .18 is rather old?

all best
[21 May 2019 10:31] yunsheng zhu
No, I have't tried on 5.7.26. This is because we are cloud service provider, so it's difficult for us to follow up closely with official.
[3 Jun 2019 11:05] MySQL Verification Team
Hi,

I don't see what in the 5.7.26 (vs 5.7.18) would fix this issue but I can't reproduce this with 5.7.26.
[5 May 2021 15:09] cheng zhao
The bug can repeat when use reporter's patch, but he did provide.

the bug is same as https://bugs.mysql.com/bug.php?id=55231
[5 May 2021 15:09] cheng zhao
The bug can repeat when use reporter's patch, but he did provide.

the bug is same as https://bugs.mysql.com/bug.php?id=55231
[5 May 2021 15:39] MySQL Verification Team
Bug is marked as duplicate of Bug#55231