MySQL Bugs: #39869: Replication stopped with 'Event too small'

Bug #39869	Replication stopped with 'Event too small'
Submitted:	6 Oct 2008 5:48	Modified:	8 Oct 2008 2:56
Reporter:	Fajar Nugraha	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	MySQL-server-5.0.67-0.glibc23	OS:	Linux (RHEL 5.2 x86_64)
Assigned to:		CPU Architecture:	Any
Tags:	Event too small, replication

Description:
Master is setup to use innodb engine with innodb_flush_log_at_trx_commit = 1 and sync_binlog=1

Master was restarted (with init 6). Replication on the slave cannot continue with this error (hostname masked)

080926 15:52:27 [Note] Slave SQL thread initialized, starting replication in log 'svr-hostname-01-bin.000178' at position 11763, re
lay log '../binlog/svr-hostname-02-relay-bin.001433' position: 11004
080926 15:52:27 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 0
080926 15:52:27 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
080926 15:52:27 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is co
rrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check th
is by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want
 to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' o
n this slave. Error_code: 0
080926 15:52:27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with
 "SLAVE START". We stopped at log 'svr-hostname-01-bin.000178' position 11763

Looking at master's binlog : (svr-hostname-01-bin.000178)

# at 11763
#080926 15:45:18 server id 7101  end_log_pos 11838      Query   thread_id=47    exec_time=0     error_code=0
SET TIMESTAMP=1222418718/*!*/;
BEGIN
/*!*/;
# at 11838
#080926 15:45:18 server id 7101  end_log_pos 12224      Query   thread_id=47    exec_time=0     error_code=0
SET TIMESTAMP=1222418718/*!*/;
<some sql statement here>
/*!*/;
# at 12224

When I change the slave's setting to skip 11763 and simply start at 11838 (with CHANGE MASTER), replication resumed again.

Now the question is :
- why does master's binlog at 11763 only shows "BEGIN" query?
- why does slave says "Event too small"?
- Is there a parameter I can set to have mysql ignore this error and skip to the next statement automatically?

How to repeat:
Not sure. This is first time I found this problem.

Suggested fix:
-

Many thanks for writing a bug report.

What exactly did you do to get this error?

Please add a short reproduceable test case with the failing transaction.

After some (more) inspection of server logs and a chat with the sysadmin, it seems that was an error in my earlier post. The master was not restarted. It crashed.

According to http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#option_mysqld_innodb_flush_l..., 

"to ensure greatest possible durability and consistency in a replication setup using InnoDB with transactions, you should use innodb_flush_log_at_trx_commit=1, sync_binlog=1"

which I did. Innodb was able to recover from the crash, and the binlog was not corrupted (at least mysqldump runs without errors).

The only question remaining is why binlog records 
# at 11763
#080926 15:45:18 server id 7101  end_log_pos 11838      Query   thread_id=47   
exec_time=0     error_code=0
SET TIMESTAMP=1222418718/*!*/;
BEGIN
/*!*/;

which made replication stopped working, but mysqldump can parse the entry just fine. Binlog attached.

Many thanks for writing a bug report. MySQL 5.0 uses statement based replication and there are already known issues at the slave after master failure.

Too avoid some of them just use row based replication. Row based replication is implemented in MySQL 5.1 (actual version here: 5.1.28-rc).

Could you point me to the list of "known issues at the slave after master failure", and which of those issues can be fixed by using row based replication in MySQL 5.1?

I got the same issue. The problem was that after a power failure, it seem like the relay logs got corrupted. By resetting the slave(reset slave; what delete relay logs) and starting the replication from *exactly* the same position as where it stopped(Exec_Master_Log_Pos&Master_Log_File), I was able o make it run again.

Unfortunately I didn't look at the relay logs, but I'm pretty sure that this was the problem, because I have another 2 slaves witch didn't have any issue with replication and didn't require any intervention in the same period.

Hope this helped some one ;o) Cheers from Moldova.

Btw that slave is a Mysql 5.1.18.