Bug #23775 Replicated event larger that max_allowed_packet infinitely re-transmits
Submitted: 30 Oct 2006 16:02 Modified: 26 Mar 2007 6:49
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:4.1.21 master 5.0.24a slave OS:Linux (RHEL 4)
Assigned to: Alexey Kopytov CPU Architecture:Any
Tags: max_allowed_packet, replication

[30 Oct 2006 16:02] [ name withheld ]
Description:
Given a 4.1.21 master and a 5.0.24a slave I experienced the following issue:

master has max_allowed_packet = 32M
slave has max_allowed_packet set to the 5.0.24a installation default

master and slave are on opposite ends of a T1 (1.544 mbit circuit)

When the slave gets a packet larger that its max_allowed_packet but smaller than the master's max_allowed_packet it will get it, generate an error on the slave, not stop the slave, and repeat infinitely. In this case it managed to saturate the bandwidth on the T1 between the master and slave for several hours overnight before the cause was found and fixed (setting max_allowed_packet = 32M on the slave)

The slaves error logs contained:

060926 18:53:47 [ERROR] Error reading packet from server: Got packet bigger than 'max_allowed_packet' bytes ( server_errno=2020)
060926 18:53:47 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'host-bin.000025' position 292325519
060926 18:53:47 [Note] Slave: connected to master '[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at position 292325519
060926 18:54:16 [ERROR] Error reading packet from server: Got packet bigger than 'max_allowed_packet' bytes ( server_errno=2020)
060926 18:54:16 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'host-bin.000025' position 292325519
060926 18:54:16 [Note] Slave: connected to master '[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at position 292325519
060926 18:54:46 [ERROR] Error reading packet from server: Got packet bigger than 'max_allowed_packet' bytes ( server_errno=2020)
060926 18:54:46 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'host-bin.000025' position 292325519
060926 18:54:46 [Note] Slave: connected to master '[rep_user]@host.example.com:3306',replication resumed in log 'host-bin.000025' at position 292325519

These entries repeat over and over again at approximately 30s intervals which is probably how long it takes for the packet in question to transmit between the master and slave

What should happen here is the slave should stop on the error condition and not reconnect to the master.

How to repeat:
1. Setup replication between two servers. In this case the master is 4.1.21 and the slave is 5.0.24a.
2. Set the master to have a larger max_allowed_packet than the slave
3. Send a replicated event (A large LOAD DATA INFILE would likely do) which is smaller than the master's max_allowed_packet size but larger than the slave's max_allowed_packet size.
4. Watch the slave's error logs cycle through the ERROR message above and continually reconnect to the master and retry.
5. tcpdump the slave's interface and watch it continuously retransmit the large packet in question

Suggested fix:
Stop replication in this case and do not reconnect until a server admin investigates and resolves the disparity in the max_allowed_packet configuation settings.
[1 Nov 2006 20:16] Sveta Smirnova
Thank you for the reasonable feature request.
[6 Nov 2006 20:56] [ name withheld ]
I really don't think this is a feature request. This is not a "cosmetic or enhancement request." It is an incorrectly handled error condition which can have a pretty significant impact on a site's available bandwidth if there are fast servers on either end and a slow pipe in between. Please consider reclassifying as at least S3 and preferably S2.
[15 Mar 2007 11:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21995

ChangeSet@1.2391, 2007-03-15 13:59:26+03:00, kaa@polly.local +5 -0
  Fix for bug #23775 "Replicated event larger that max_allowed_packet infinitely re-transmits".
  
  Test for the client "packet too large" error code instead of the server one in slave I/O thread.
[16 Mar 2007 14:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/22139

ChangeSet@1.2391, 2007-03-16 17:25:20+03:00, kaa@polly.local +3 -0
  Fix for bug #23775 "Replicated event larger that max_allowed_packet infinitely re-transmits".
  
  Problem: to handle a situation when the size of event on the master is greater than max_allowed_packet on slave, we checked for the wrong constant (ER_NET_PACKET_TOO_LARGE instead of CR_NET_PACKET_TOO_LARGE).
  
  Solution: test for the client "packet too large" error code instead of the server one in slave I/O thread.
[26 Mar 2007 6:49] MC Brown
A note has been added to the 5.0.40 and 5.1.17 changelogs.