Bug #25737 Please add checksum to binlog events
Submitted: 20 Jan 2007 19:41 Modified: 12 Mar 3:14
Reporter: Baron Schwartz (Basic Quality Contributor)
Status: Verified
Category:Server: Replication Severity:S4 (Feature request)
Version: OS:Any
Assigned to: Sven Sandberg Target Version:
Tags: qc, replication checksum, bfsm_2007_10_18
Triage: Triaged: D5 (Feature request)

[20 Jan 2007 19:41] Baron Schwartz
Description:
I would like the binlog to include checksums for each event so binlog corruption can be
detected better on the slaves.

How to repeat:
Feature request.
[7 Feb 2007 17:30] Valeriy Kravchuk
Thank you for a reasonable feature request.
[7 Feb 2007 20:00] Jeremy Cole
I would love to see this done -- in fact I suggested it myself perhaps 2 years ago at
least. :)  I don't think it should be that hard...
[17 May 2007 10:10] Richard George
This is near-critical for us, see #21623.
[22 Aug 2007 23:46] James Day
Try an SSL connection to get that extra layer of integrity checking.
[23 Aug 2007 14:45] Baron Schwartz
That only ensures the bits don't get garbled on the wire.  It gives no assurances to the
Slave SQL thread that it is reading unmangled data.
[4 Sep 2007 2:04] James Day
Agreed but corruption on the wire seems to be by far the most common cause, so it's worth
mentioning it. No law against disk/RAM issues though - those happen too.

I asked the replication team for this in person two years ago at one of our developer
meetings and will be doing the same in a couple of weeks.
[20 Sep 2007 14:35] Mark Callaghan
I want this to. We had hardware problem that flipped a bit on some ascii characters so
that the result was readable but wrong. TCP checksums passed but the queries failed,
MySQL was blamed, and debugging was required until the HW problem was found.
[21 Sep 2007 1:38] James Day
The replication team agrees that this is desirable for both the IO (to catch it fast) and
SQL (to catch memory/disk issues) threads. It's also been demonstrated, including by
Mark's example, that the TCP 4 byte checkum is not sufficiently sensitive, so it'll need
to be larger or better.

Retrying after corrupt binary log events, and logging the surrounding events in case it's
a bug rather than corruption, also agreed.

No timetable for when to do this at present, still too early in the process for that.
[14 Oct 2007 3:47] James Day
See bug #29813 "replication errors on a unstable network" for another report of corruption
on unstable VPN connections.
[14 Oct 2007 4:03] James Day
The visible worklog item for this is at http://forge.mysql.com/worklog/task.php?id=2540 ,
which started in April 2005.
[28 Oct 2007 20:24] James Day
Checksums for replication events are currently on the server roadmap for the version after
6.0, with a current target of Q1 2009 release, subject to change.
[8 Aug 2008 23:07] Jeremy Zawodny
Allow me to add a vote for this as well.

We got bit by this on two servers yesterday because of a network glitch.  A checksum
could have found the error an re-requested the problematic event.
[23 Sep 2008 4:52] Gonzalo Carvajal
Yes! please i need that feature too.
I have 4 servers, 1 master and 3 slaves. They are not in a LAN, they are in different
cities. I configure the my.ini in the slave for replicating just a few tables, not all
the database and very often i have replication crashes because every modification on
Master is sent to the slaves, not only the tables i want. The more common error is
because sql statements (until now, they´re all for tables other than i want to
replicate) are not crc
verified and arrive to slaves with syntax errors (wrong data). 

Please sorry my english, i hope you can understand
I'm working with 5.0.45-community-nt-log, MySQL Community Edition (GPL)
[23 Sep 2008 17:57] Sven Sandberg
Gonzalo, you are probably experiencing BUG#26489. You may try to upgrade to version 5.0.56
where that was fixed.
[12 Mar 3:14] Baron Schwartz
Looks like Google has done this as part of another patch.

http://mysqlha.blogspot.com/2009/03/global-transaction-ids-are-hot.html