Bug #23775 Replicated event larger that max_allowed_packet infinitely re-transmits
Submitted: 30 Oct 2006 17:02 Modified: 26 Mar 2007 8:49
Reporter: [ name withheld ]
Status: Closed
Category:Server: Replication Severity:S2 (Serious)
Version:4.1.21 master 5.0.24a slave OS:Linux (RHEL 4)
Assigned to: Alexey Kopytov Target Version:
Tags: replication, max_allowed_packet

[30 Oct 2006 17:02] [ name withheld ]
Description:
Given a 4.1.21 master and a 5.0.24a slave I experienced the following issue:

master has max_allowed_packet = 32M
slave has max_allowed_packet set to the 5.0.24a installation default

master and slave are on opposite ends of a T1 (1.544 mbit circuit)

When the slave gets a packet larger that its max_allowed_packet but smaller than the
master's max_allowed_packet it will get it, generate an error on the slave, not stop the
slave, and repeat infinitely. In this case it managed to saturate the bandwidth on the T1
between the master and slave for several hours overnight before the cause was found and
fixed (setting max_allowed_packet = 32M on the slave)

The slaves error logs contained:

060926 18:53:47 [ERROR] Error reading packet from server: Got packet bigger than
'max_allowed_packet' bytes ( server_errno=2020)
060926 18:53:47 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,
log 'host-bin.000025' position 292325519
060926 18:53:47 [Note] Slave: connected to master
'[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at
position 292325519
060926 18:54:16 [ERROR] Error reading packet from server: Got packet bigger than
'max_allowed_packet' bytes ( server_errno=2020)
060926 18:54:16 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,
log 'host-bin.000025' position 292325519
060926 18:54:16 [Note] Slave: connected to master
'[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at
position 292325519
060926 18:54:46 [ERROR] Error reading packet from server: Got packet bigger than
'max_allowed_packet' bytes ( server_errno=2020)
060926 18:54:46 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry,
log 'host-bin.000025' position 292325519
060926 18:54:46 [Note] Slave: connected to master
'[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at
position 292325519

These entries repeat over and over again at approximately 30s intervals which is probably
how long it takes for the packet in question to transmit between the master and slave

What should happen here is the slave should stop on the error condition and not reconnect
to the master.

How to repeat:
1. Setup replication between two servers. In this case the master is 4.1.21 and the slave
is 5.0.24a.
2. Set the master to have a larger max_allowed_packet than the slave
3. Send a replicated event (A large LOAD DATA INFILE would likely do) which is smaller
than the master's max_allowed_packet size but larger than the slave's max_allowed_packet
size.
4. Watch the slave's error logs cycle through the ERROR message above and continually
reconnect to the master and retry.
5. tcpdump the slave's interface and watch it continuously retransmit the large packet in
question

Suggested fix:
Stop replication in this case and do not reconnect until a server admin investigates and
resolves the disparity in the max_allowed_packet configuation settings.
[1 Nov 2006 21:16] Sveta Smirnova
Thank you for the reasonable feature request.
[6 Nov 2006 21:56] [ name withheld ]
I really don't think this is a feature request. This is not a "cosmetic or enhancement
request." It is an incorrectly handled error condition which can have a pretty
significant impact on a site's available bandwidth if there are fast servers on either
end and a slow pipe in between. Please consider reclassifying as at least S3 and
preferably S2.
[15 Mar 2007 12:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21995

ChangeSet@1.2391, 2007-03-15 13:59:26+03:00, kaa@polly.local +5 -0
  Fix for bug #23775 "Replicated event larger that max_allowed_packet infinitely
re-transmits".
  
  Test for the client "packet too large" error code instead of the server one in slave
I/O thread.
[16 Mar 2007 15:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/22139

ChangeSet@1.2391, 2007-03-16 17:25:20+03:00, kaa@polly.local +3 -0
  Fix for bug #23775 "Replicated event larger that max_allowed_packet infinitely
re-transmits".
  
  Problem: to handle a situation when the size of event on the master is greater than
max_allowed_packet on slave, we checked for the wrong constant (ER_NET_PACKET_TOO_LARGE
instead of CR_NET_PACKET_TOO_LARGE).
  
  Solution: test for the client "packet too large" error code instead of the server one
in slave I/O thread.
[26 Mar 2007 8:49] MC Brown
A note has been added to the 5.0.40 and 5.1.17 changelogs.