Bug #46769 Replication Broken when Max Packet Size Exceeded
Submitted: 17 Aug 2009 21:40 Modified: 8 Sep 2009 4:33
Reporter: Steven Hartland Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.0.83 OS:Any
Assigned to: CPU Architecture:Any

[17 Aug 2009 21:40] Steven Hartland
Description:
We had an issue between two remote sites running replication. It seems that when the slave site encountered a replication packet which was too big to process, instead of retrying or aborting it continued after said packet loosing lots of date.

The following is what we found in the slave server log:-
090815 18:29:15 [ERROR] Error reading packet from server: Got packet bigger
than 'max_allowed_packet' bytes ( server_errno=2020)

090815 18:29:15 [Note] Slave I/O thread: Failed reading log event,
reconnecting to retry, log 'mysql-bin.000002' position 99848631

090815 18:29:15 [Note] Slave: connected to master
'pro_cadredv3@85.236.96.42:3306',replication resumed in log
'mysql-bin.000002' at position 99848631

How to repeat:
This is unconfirmed as it was a live replication, so we had to fix ASAP, but what we believe is the reproduction steps:
1. Create a backup of a the db
2. Import and setup slave
3. Start the slave but with a default packet size ( no mention in my.cnf ) after significant changes on the master.

In our case most tables where innodb tables in case that makes a difference.
[17 Aug 2009 22:05] Miguel Solorzano
Thank you for the bug report. So it is correct to assume that in the master's my.cnf the value of max_allowed_packet was accordingly modified but not in the slave side (step 3 how to repeat)?. Thanks in advance.
[17 Aug 2009 23:26] Steven Hartland
Confirmed in the master we had 10M for max packet size but in the slave this was missing. The initial few rows seemed to transfer fine but when we returned several hours later even though the slave status indicated all was well most of the data was still missing from the slave.

Looking in the slave log we saw the error listed.
After:
1. updating max packet size on the slave and restarting
2. resetting it to original backup position
3. deleting the few rows from the tables that had replicated
4. restarting the slave

Replication completed without issue and all data was present and correct.
[17 Aug 2009 23:31] Steven Hartland
Note: The biggest issue we see as not the failure to transfer, but the fact the slave status indicated no errors even though large amounts of data was actually missing.

Its only because the missing data was so obvious in our case that we noticed. It could potentially be subtle amounts of missing data which may not be noticed until its too late.
[18 Aug 2009 6:17] Sveta Smirnova
Thank you for the feedback.

You wrote you have no error on slave, but pasted error from the error log. Could you please also provide output of SHOW SLAVE STATUS\G?
[18 Aug 2009 7:59] Steven Hartland
Just to be clear what I meant was there was no error on the slave status. Which is where an admin will look to check the status of slave replication i.e. its not common to check the main server error long when performing this check.

The following was the slave status after the error had occured, as you can see no error is apparent
mysql> SHOW SLAVE STATUS;
+----------------------------------+--------------+--------------+-------------+---------------+------------------+---------------------+-----------------------------------+---------------+-----------------------+------------------+-------------------+-----------------+---------------------+--------------------+------------------------+-------------------------+-----------------------------+------------+------------+--------------+---------------------+-----------------+-----------------+----------------+---------------+--------------------+--------------------+--------------------+-----------------+-------------------+----------------+-----------------------+
| Slave_IO_State                   | Master_Host  | Master_User  | Master_Port | Connect_Retry | Master_Log_File  | Read_Master_Log_Pos | Relay_Log_File                    | Relay_Log_Pos | Relay_Master_Log_File | Slave_IO_Running | Slave_SQL_Running | Replicate_Do_DB | Replicate_Ignore_DB | Replicate_Do_Table | Replicate_Ignore_Table | Replicate_Wild_Do_Table | Replicate_Wild_Ignore_Table | Last_Errno | Last_Error | Skip_Counter | Exec_Master_Log_Pos | Relay_Log_Space | Until_Condition | Until_Log_File | Until_Log_Pos | Master_SSL_Allowed | Master_SSL_CA_File | Master_SSL_CA_Path | Master_SSL_Cert | Master_SSL_Cipher | Master_SSL_Key | Seconds_Behind_Master |
+----------------------------------+--------------+--------------+-------------+---------------+------------------+---------------------+-----------------------------------+---------------+-----------------------+------------------+-------------------+-----------------+---------------------+--------------------+------------------------+-------------------------+-----------------------------+------------+------------+--------------+---------------------+-----------------+-----------------+----------------+---------------+--------------------+--------------------+--------------------+-----------------+-------------------+----------------+-----------------------+
| Waiting for master to send event | XX.XXX.XX.XX | myuser       |        3306 |            60 | mysql-bin.000002 |            99848631 | xxxx-relay-bin.024353             |           235 | mysql-bin.000002      | Yes              | Yes               | mydb            |                     |                    |                        |                         |                             |          0 |            |            0 |            99848631 |             235 | None            |                |             0 | No                 |                    |                    |                 |                   |                |                     0 |
+----------------------------------+--------------+--------------+-------------+---------------+------------------+---------------------+-----------------------------------+---------------+-----------------------+------------------+-------------------+-----------------+---------------------+--------------------+------------------------+-------------------------+-----------------------------+------------+------------+--------------+---------------------+-----------------+-----------------+----------------+---------------+--------------------+--------------------+--------------------+-----------------+-------------------+----------------+-----------------------+
1 row in set (0.00 sec)
[18 Aug 2009 8:35] Sveta Smirnova
Thank you for the feedback.

Please check if this is same case as bug #44690
[20 Aug 2009 15:46] Steven Hartland
Looks similar but not the same as that report doesnt mention any data loss which we had.
[8 Sep 2009 4:33] Susanne Ebrecht
Data lost also happens in bug #44690.

This bug report here is a duplicate of bug #44690