MySQL Bugs: #25150: Request for more durable binlog when hardware reboots

Bug #25150	Request for more durable binlog when hardware reboots
Submitted:	18 Dec 2006 16:17	Modified:	18 Dec 2006 16:51
Reporter:	Mark Callaghan	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:	5.0	OS:	Linux (Linux 2.6)
Assigned to:		CPU Architecture:	Any
Tags:	binlog, replication, truncate

Description:
We had a hardware reboot on a MySQL 4.0.26 server. After the server restarted, approximately 27 seconds of replication events were gone from the tail of the binlog and slaves got 'impossible position' errors.

I assume that the tail of the binlog was lost because:
* we use ext2
* fsync is not done after every binlog write
* the binlog file is not preallocated. I assume that MySQL does not fsync the binlog file after extending it and before writing it.

I don't blame MySQL for this, but I think it could be more robust despite such failures.

How to repeat:
Start a master with a transaction workload replicating to one or more slaves. Reboot the hardware. Repeat until the tail of the binlog is lost.

Suggested fix:
The binlog could be more durable in light of such failures by:
* calling fsync on the binlog once per second (as can be done for InnoDB tx log)
* avoid pending filesystem metadata operations for the range of the binlog file that contains replication events (fsync after extending but before writing)

Mark,

4.1 has the option --sync-binlog=N, which does a fsync() on the
binlog after every Nth write to it.

It works just fine and I think it would not be that difficult to extract a patch just for that feature in 4.1 and apply it yourself, with our help, if needed.

Let us know what else can we do for you.

Sinisa Milivojevic

Thanks for the prompt response. That feature will help a lot.