Bug #26201 Replication broke on packet size, but show slave status does not indicate this
Submitted: 8 Feb 2007 23:25 Modified: 31 May 2007 22:47
Reporter: Arjen Lentz
Status: Duplicate
Category:Server: Replication Severity:S3 (Non-critical)
Version:5.0, 4.1 BK, 5.1 BK OS:Any (any)
Assigned to: Bugs System Target Version:
Tags: replication, max_allowed_packet, slave status, bfsm_2007_02_15

[8 Feb 2007 23:25] Arjen Lentz
Description:
Due to a load data infile statement in the master and the max_allowed_packet being too
low, replication broke on the slave:
[ERROR] Error reading packet from server: log event entry
exceeded max_allowed_packet; Increase max_allowed_packet on master (
server_errno=1236)
[ERROR] Got fatal error 1236: 'log event entry exceeded
max_allowed_packet; Increase max_allowed_packet on master' from master when
reading data from binary log

However, SHOW SLAVE STATUS does not indicate any problem specifically.
IO state is empty (no connection to master), SQL thread running, IO thread stopped, no
error message.

Since this is the first place to check out replication problems, issues such as the above
should be visible here. Even STOP SLAVE and START SLAVE does not make anything appear, the
details are only in the error log.

Slave Status:

Slave_IO_State:
Master_Host: n.n.n.n
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Slave_IO_Running: No
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Seconds_Behind_Master: NULL

How to repeat:
See description.
[9 Feb 2007 12:45] Sveta Smirnova
test case

Attachment: rpl_bug26201.test (application/octet-stream, text), 465 bytes.

[9 Feb 2007 12:45] Sveta Smirnova
data file

Attachment: bug26201.dat (application/octet-stream, text), 2.46 KiB.

[9 Feb 2007 12:46] Sveta Smirnova
master options file

Attachment: rpl_bug26201-master.opt (application/octet-stream, text), 26 bytes.

[9 Feb 2007 12:46] Sveta Smirnova
slave options file

Attachment: rpl_bug26201-slave.opt (application/octet-stream, text), 26 bytes.

[9 Feb 2007 12:47] Sveta Smirnova
Thank you for the report.

Verified as described using attached test and data files.

To use test file, repplace path to data file to correct file on your system.
[9 Feb 2007 12:48] Sveta Smirnova
All versions are affected
[31 May 2007 19:45] Damien Katz
I was unable to reproduce the "no error message" after replication failure in 5.0. It does
keep an error message after slave failure, however the error message it leaves is fairly
generic:

"Could not parse relay log event entry. The possible reasons are: the master's binary log
is corrupted (you can check this by running 'mysqlbinlog' on the 
binary log), the slave's relay log is corrupted (you can check this by running
'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's
MySQL code. If you want to check the master's binary log or slave's relay log, you will
be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave."

One fix option is to improve the error reporting that happens between the slave
processing code and the log_event code, so that the slave code can output a more
informative error message for SHOW SLAVE STATUS; That helps admins quickly diagnose
problems, but will will require changes to the error handling code any may complicate the
error handling.

Or most simply, we could just change the generic error message to indicate that more
information is available about the failure in the error logs. This requires no change the
error handling code and keeps things simpler.
[31 May 2007 22:47] Damien Katz
This issue is also being addressed in bug#24954.
[3 Jun 2007 0:06] Arjen Lentz
Please clarify how the fix for bug#24954 resolves the issue reported here.
Thanks
[3 Jun 2007 12:46] Mats Kindahl
There is no error message in the ``SHOW SLAVE STATUS`` output since the ``Last_Error`` and
``Last_Errno`` fields are errors of the SQL thread, not the I/O thread. The error that
occurs (packet size failure) causes the I/O thread to stop and print an error in the
error log (this is what the error report says). Since the SQL thread is still running, no
error message is displayed in the ``SHOW SLAVE STATUS`` output. (If the SQL thread stopped
due to an error, an error message would be displayed in the ``Last_Error`` field. If it
stopped for non-error reasons, no error message would be displayed.)
[3 Jun 2007 23:34] Arjen Lentz
Mats - sorry, I just browsed through all of the other bug and found the details.
What you were describing above was the old situation, not the solution that I asked about
;-) But it's clear now, thanks. And the solution - separate IO and SQL thread error
entries in show slave status - sounds good.
[4 Jun 2007 8:56] Mats Kindahl
Excellent Arjen! Then I consider the problem as solved as soon as I push the patch for
BUG#24954. :)