Bug #26201 Replication broke on packet size, but show slave status does not indicate this
Submitted: 8 Feb 2007 22:25 Modified: 31 May 2007 20:47
Reporter: Arjen Lentz Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0, 4.1 BK, 5.1 BK OS:Any (any)
Assigned to: Assigned Account CPU Architecture:Any
Tags: bfsm_2007_02_15, max_allowed_packet, replication, slave status

[8 Feb 2007 22:25] Arjen Lentz
Description:
Due to a load data infile statement in the master and the max_allowed_packet being too low, replication broke on the slave:
[ERROR] Error reading packet from server: log event entry
exceeded max_allowed_packet; Increase max_allowed_packet on master (
server_errno=1236)
[ERROR] Got fatal error 1236: 'log event entry exceeded
max_allowed_packet; Increase max_allowed_packet on master' from master when
reading data from binary log

However, SHOW SLAVE STATUS does not indicate any problem specifically.
IO state is empty (no connection to master), SQL thread running, IO thread stopped, no error message.

Since this is the first place to check out replication problems, issues such as the above should be visible here. Even STOP SLAVE and START SLAVE does not make anything appear, the details are only in the error log.

Slave Status:

Slave_IO_State:
Master_Host: n.n.n.n
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Slave_IO_Running: No
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Seconds_Behind_Master: NULL

How to repeat:
See description.
[9 Feb 2007 11:45] Sveta Smirnova
test case

Attachment: rpl_bug26201.test (application/octet-stream, text), 465 bytes.

[9 Feb 2007 11:45] Sveta Smirnova
data file

Attachment: bug26201.dat (application/octet-stream, text), 2.46 KiB.

[9 Feb 2007 11:46] Sveta Smirnova
master options file

Attachment: rpl_bug26201-master.opt (application/octet-stream, text), 26 bytes.

[9 Feb 2007 11:46] Sveta Smirnova
slave options file

Attachment: rpl_bug26201-slave.opt (application/octet-stream, text), 26 bytes.

[9 Feb 2007 11:47] Sveta Smirnova
Thank you for the report.

Verified as described using attached test and data files.

To use test file, repplace path to data file to correct file on your system.
[9 Feb 2007 11:48] Sveta Smirnova
All versions are affected
[31 May 2007 17:45] Damien Katz
I was unable to reproduce the "no error message" after replication failure in 5.0. It does keep an error message after slave failure, however the error message it leaves is fairly generic:

"Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the 
binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave."

One fix option is to improve the error reporting that happens between the slave processing code and the log_event code, so that the slave code can output a more informative error message for SHOW SLAVE STATUS; That helps admins quickly diagnose problems, but will will require changes to the error handling code any may complicate the error handling.

Or most simply, we could just change the generic error message to indicate that more information is available about the failure in the error logs. This requires no change the error handling code and keeps things simpler.
[31 May 2007 20:47] Damien Katz
This issue is also being addressed in bug#24954.
[2 Jun 2007 22:06] Arjen Lentz
Please clarify how the fix for bug#24954 resolves the issue reported here.
Thanks
[3 Jun 2007 10:46] Mats Kindahl
There is no error message in the ``SHOW SLAVE STATUS`` output since the ``Last_Error`` and ``Last_Errno`` fields are errors of the SQL thread, not the I/O thread. The error that occurs (packet size failure) causes the I/O thread to stop and print an error in the error log (this is what the error report says). Since the SQL thread is still running, no error message is displayed in the ``SHOW SLAVE STATUS`` output. (If the SQL thread stopped due to an error, an error message would be displayed in the ``Last_Error`` field. If it stopped for non-error reasons, no error message would be displayed.)
[3 Jun 2007 21:34] Arjen Lentz
Mats - sorry, I just browsed through all of the other bug and found the details.
What you were describing above was the old situation, not the solution that I asked about ;-) But it's clear now, thanks. And the solution - separate IO and SQL thread error entries in show slave status - sounds good.
[4 Jun 2007 6:56] Mats Kindahl
Excellent Arjen! Then I consider the problem as solved as soon as I push the patch for BUG#24954. :)