MySQL Bugs: #25737: Please add checksum to binlog events

Bug #25737	Please add checksum to binlog events
Submitted:	20 Jan 2007 18:41	Modified:	9 Feb 2011 21:42
Reporter:	Baron Schwartz (Basic Quality Contributor)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:		OS:	Any
Assigned to:	Andrei Elkin	CPU Architecture:	Any
Tags:	bfsm_2007_10_18, qc, replication checksum

Description:
I would like the binlog to include checksums for each event so binlog corruption can be detected better on the slaves.

How to repeat:
Feature request.

Thank you for a reasonable feature request.

I would love to see this done -- in fact I suggested it myself perhaps 2 years ago at least. :)  I don't think it should be that hard...

This is near-critical for us, see #21623.

Try an SSL connection to get that extra layer of integrity checking.

That only ensures the bits don't get garbled on the wire.  It gives no assurances to the Slave SQL thread that it is reading unmangled data.

Agreed but corruption on the wire seems to be by far the most common cause, so it's worth mentioning it. No law against disk/RAM issues though - those happen too.

I asked the replication team for this in person two years ago at one of our developer meetings and will be doing the same in a couple of weeks.

I want this to. We had hardware problem that flipped a bit on some ascii characters so that the result was readable but wrong. TCP checksums passed but the queries failed, MySQL was blamed, and debugging was required until the HW problem was found.

The replication team agrees that this is desirable for both the IO (to catch it fast) and SQL (to catch memory/disk issues) threads. It's also been demonstrated, including by Mark's example, that the TCP 4 byte checkum is not sufficiently sensitive, so it'll need to be larger or better.

Retrying after corrupt binary log events, and logging the surrounding events in case it's a bug rather than corruption, also agreed.

No timetable for when to do this at present, still too early in the process for that.

See bug #29813 "replication errors on a unstable network" for another report of corruption on unstable VPN connections.

The visible worklog item for this is at http://forge.mysql.com/worklog/task.php?id=2540 , which started in April 2005.

Checksums for replication events are currently on the server roadmap for the version after 6.0, with a current target of Q1 2009 release, subject to change.

Allow me to add a vote for this as well.

We got bit by this on two servers yesterday because of a network glitch.  A checksum could have found the error an re-requested the problematic event.

Yes! please i need that feature too.
I have 4 servers, 1 master and 3 slaves. They are not in a LAN, they are in different cities. I configure the my.ini in the slave for replicating just a few tables, not all the database and very often i have replication crashes because every modification on Master is sent to the slaves, not only the tables i want. The more common error is because sql statements (until now, they´re all for tables other than i want to replicate) are not crc
verified and arrive to slaves with syntax errors (wrong data). 

Please sorry my english, i hope you can understand
I'm working with 5.0.45-community-nt-log, MySQL Community Edition (GPL)

Gonzalo, you are probably experiencing BUG#26489. You may try to upgrade to version 5.0.56 where that was fixed.

Looks like Google has done this as part of another patch.

http://mysqlha.blogspot.com/2009/03/global-transaction-ids-are-hot.html

This task is tracked by WL#2540, not by this bug

The feature request was addressed by wl#2540 replication event checksum.

What release has or will have this feature? While I get the restrictions against stating what will be in a future release, I thought the standard policy for closing a bug or a feature request is to document the release in which it has been fixed. And yes, Catch-22 was a favorite book of mine.

Mark, this task efforts were/are actually tracked in WL#2540, which 
has been merged into a development tree (mysql-trunk).

We had mistakenly set this bug status to CLOSED, but corrected its 
state to DOCUMENTING. Once WL#2540 gets documented and released, 
this bug/FR shall be closed as well.

So is this in 5.6.0?  Will it be in 5.6.1?  Some later version?

see http://forge.mysql.com/worklog/task.php?id=2540 for the worklog.

This feature is added in the MySQL 5.6.2 Development Milestone Release. This is  preview of technology that may be in the next major MySQL version, not a current production-rated release. In general these are intended to be of release candidate quality.

Don't expect it to be added to MySQL 5.5 or any earlier production release.

For other replication features added to 5.6.2 and now announced see http://blogs.oracle.com/mysql/2010/11/mysql_55_whats_new_in_replication.html . Presence of a feature in a Development Milestone Release is not a guarantee that it will be present in a future production-rated release.