Bug #61096 Replication from master with new checksum algorithm
Submitted: 9 May 2011 9:40 Modified: 7 Mar 2012 12:02
Reporter: Mats Kindahl Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.6 OS:Any
Assigned to:
Tags: checksum, replication
Triage: Needs Triage: D2 (Serious)

[9 May 2011 9:40] Mats Kindahl
In the event that the master is extended with a new checksum algorithm not known to the slave, the slave will deduce that there is no checksum support in the master. However, the master will send events with the new checksum while the slave treats it as no checksum, with potential for treating the checksum as part of the event body.

How to repeat:
See line 1753 in rpl_slave.cc:

      if (!mysql_real_query(mysql,
                            STRING_WITH_LEN("SELECT @master_binlog_checksum")) &&
          (master_res= mysql_store_result(mysql)) &&
          (master_row= mysql_fetch_row(master_res)) &&
          (master_row[0] != NULL))
        mi->checksum_alg_before_fd= (uint8)
          find_type(master_row[0], &binlog_checksum_typelib, 1) - 1;
        // valid outcome is either of
        DBUG_ASSERT(mi->checksum_alg_before_fd == BINLOG_CHECKSUM_ALG_OFF ||
                    mi->checksum_alg_before_fd == BINLOG_CHECKSUM_ALG_CRC32);

The find_type() call here will return 0 for checksum algorithms that are not known to the slave.

Suggested fix:
Change the code that checks the result of the

    SET @master_binlog_checksum = @@global.binlog_checksum

Set checksum_alg_before_fd to OFF if an error is reported and UNKNOWN if there were no error but find_type in the excerpt above above returns 0.

Either report an error and stop the slave or, if possible, report a warning and remove the checksum from the event before processing it.
[9 May 2011 9:57] Miguel Solorzano
Thank you for the bug report.
[7 Mar 2012 12:02] Jon Stephens
Fixed in 5.6. Documented as follows in the 5.6.6 changelog:

      Setting binlog_checksum on the master to a value that was unknown
      on the slave caused replication to fail. Now in such cases, the
      checksum is disabled on the slave and replication stops with an
      appropriate error message.