Bug #99628 semi sync master not handle ack packet correctly when recv packet timeout
Submitted: 19 May 2020 9:16 Modified: 21 May 2020 6:46
Reporter: lou shuai (OCA) Email Updates:
Status: Verified Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.* OS:Any
Assigned to: CPU Architecture:Any

[19 May 2020 9:16] lou shuai
When using semi-sync, master sometimes reports the following message in error log:

[ERROR] Read semi-sync reply magic number error
[ERROR] mysqld: Got timeout reading communication packets
[ERROR] mysqld: Got packets out of order
[ERROR] mysqld: Got a packet bigger than 'max_allowed_packet' bytes
So we track the code of semi-sync, and find the master not handle ACK packet correctly:


void Ack_receiver::run() {  

        do {
          net_clear(&net, 0);

          len = my_net_read(&net);
          if (likely(len != packet_error))
            repl_semisync->reportReplyPacket(slave_obj.server_id, net.read_pos,
          else if (net.last_errno == ER_NET_READ_ERROR)
        } while (net.vio->has_data(net.vio) && m_status == ST_UP);


When the ack packet is split into 2 TCP segments, and if the time interval between the 2 TCP segment is sent to master exceed 1ms, the my_net_read will read part bytes of the ACK packet, and return packet_error caused by wait timeout, leaving the rest bytes of ACK packet alone. So in the new next round, my_net_read will start from the rest bytes of the last ACK packet.

How to repeat:
Hard to repeat.

Suggested fix:
Allocate Net for each Slave, and keep the bytes it has read. net_clear only when read the full ACK packet.
[21 May 2020 6:46] MySQL Verification Team

Thanks for the report. I can't reproduce this but I agree with your code analysis so verifying the bug.

All best