Description:
We have had several servers crash (HW reboot) while running as a mysql master. Because of bad luck and the crash a partial event is at the tail of the binlog. In one case the last binlog file is 537460736 bytes, the partial event starts at 537460368 and has a data length of 599. The the binlog file should have at least 537460967 bytes (probably more because that event also needs a header). So there are at least 231 bytes missing.
The problem is not that there is a partial event at the end of the binlog file. The problem is that mysqld does not recover from this problem and slaves halt on that binlog file. There are at least two ways to hit this state:
1) disk full (http://bugs.mysql.com/bug.php?id=32228, http://forge.mysql.com/worklog/task.php?id=1040)
2) OS crash/reboot
I do not see code in TC_LOG_BINLOG::recover to handle this case. My tests that crash a master after writing some of the transaction cache in ::write_cache make the slave get stuck on the last binlog file.
There is code in TC_LOG_BINLOG::open(const char *opt_name) that is run when the binlog was not closed properly, it calls ::recover
if ((ev= Log_event::read_log_event(&log, 0, &fdle, NULL)) &&
ev->get_type_code() == FORMAT_DESCRIPTION_EVENT &&
ev->flags & LOG_EVENT_BINLOG_IN_USE_F)
{
sql_print_information("Recovering after a crash using %s", opt_name);
error= recover(&log, (Format_description_log_event *)ev);
But the code in ::recover doesn't handle our problem. It collects all XID events from the last binlog and uses that to determine what to do with PREPARED transactions in InnoDB.
TC_LOG_BINLOG::recover has this loop. It doesn't appear to
while ((ev= Log_event::read_log_event(log,0,fdle,NULL)) && ev->is_valid())
{
if (ev->get_type_code() == XID_EVENT)
{
Xid_log_event *xev=(Xid_log_event *)ev;
uchar *x= (uchar *) memdup_root(&mem_root, (uchar*) &xev->xid,
sizeof(xev->xid));
if (!x || my_hash_insert(&xids, x))
goto err2;
}
delete ev;
}
How to repeat:
Add this code to MYSQL_BIN_LOG::write_cache, and then write an rpl mysql-test-run test to crash the server.
DBUG_EXECUTE_IF("partial_binlog_write_and_crash", {
int x= 0;
srand48(getpid());
while (!x) {
double d = drand48();
x= (int) (d * length);
fprintf(stderr, "next rn is %d using %.6f\n", x, d);
}
my_b_write(&log_file, cache->read_pos, x);
flush_io_cache(&log_file);
fprintf(stderr, "crash after writing %d of %d\n", x, length);
abort();
});