MySQL Bugs: #34545: Replication broken with error message at tail of relaylog

Bug #34545	Replication broken with error message at tail of relaylog
Submitted:	14 Feb 2008 13:15	Modified:	14 Feb 2008 16:05
Reporter:	Kristian Koehntopp	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.0.52-enterprise-gpl-log	OS:	Linux (Linux /name edited/ 2.6.9-55.0.9.ELsmp #1 SMP Thu Sep 27 18:28:00 EDT 2007 x86_6)
Assigned to:		CPU Architecture:	Any

Description:
Replication stops with 

080214 11:53:39  mysqld started
080214 11:53:42  InnoDB: Started; log sequence number 154 4166874136
080214 11:53:43 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.0.52-enterprise-gpl-log'  socket: '/mysql/bp/data/mysql.sock'  port: 3306  MySQL Enterprise Server (GPL)
080214 11:53:43 [Note] Slave SQL thread initialized, starting replication in log 'binlog.000127' at position 174143426, relay log '/mysql/bp/log/relaylog.432215' position: 174143560
080214 11:53:43 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 808464473, event_type: -1
080214 11:53:43 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
080214 11:53:43 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
080214 11:53:43 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'binlog.000127' position 174143426
080214 11:53:43 [Note] Slave I/O thread: connected to master 'm1m1repl@bpmdb-vip2:3306',  replication started in log 'binlog.000127' at position 289598261
080214 12:35:46 [Note] Slave I/O thread killed while reading event
080214 12:35:46 [Note] Slave I/O thread exiting, read up to log 'binlog.000127', position 367649140

Relaylog relaylog.432215 174143560 corrosponds to binlog.000127 174143426.

The preserved relaylog is

-rw-r-----  1 root root 174143630 Feb 14 12:35 relaylog.432215

This is 70 bytes longer than the reported relay log position. A dd | od of the 70 bytes is

[root@bc10rodb-01 kris]# dd bs=1 skip=174143560 if=relaylog.432215 2>/dev/null | od -t x1a
0000000 1e 00 00 ce ff 51 04 23 48 59 30 30 30 66 61 69
         rs nul nul   N del   Q eot   #   H   Y   0   0   0   f   a   i
0000020 6c 65 64 20 6f 6e 20 6e 65 74 5f 66 fc 0e b4 47
          l   e   d  sp   o   n  sp   n   e   t   _   f   |  so   4   G
0000040 04 e1 27 c6 08 2a 00 00 00 8e 38 61 0a 00 00 04
        eot   a   '   F  bs   * nul nul nul  so   8   a  nl nul nul eot
0000060 00 00 00 00 00 00 00 72 65 6c 61 79 6c 6f 67 2e
        nul nul nul nul nul nul nul   r   e   l   a   y   l   o   g   .
0000100 34 33 32 32 31 36
          4   3   2   2   1   6

Watch the ASCII - is this an error log message in the relay log?

How to repeat:
Hard to reproduce, spurious corruption which is not affecting the other slaves at the same position. This points to a problem in the slave code?

Likely a case of http://bugs.mysql.com/bug.php?id=26489, as per analysis of Lars (25-Jan-2008)

This is a duplicate of bug #26489