MySQL Bugs: #47200: RBR: Absence of PK on slave leads to slave stop.

Bug #47200	RBR: Absence of PK on slave leads to slave stop.
Submitted:	8 Sep 2009 22:38	Modified:	24 Mar 2011 15:50
Reporter:	Luis Soares	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S3 (Non-critical)
Version:	6.0 codebase	OS:	Any
Assigned to:	Luis Soares	CPU Architecture:	Any
Tags:	RBR, regression

Description:
When replicating data for a table that has a PK on the master but not on the slave, replication breaks. This showed up when using innodb tables.

Consider the following table definition on the master:

CREATE TABLE t (c1 int, c2 char(1), c3 char(1), c4 char(1), primary key(c1,c3)) engine= InnoDB;

And on the slave:

CREATE TABLE t (c1 int, c2 char(1), c3 char(1), c4 char(1)) engine= InnoDB;

If the following sequence of commands is issued on the master:

INSERT INTO t VALUES (1, '1', '1', '1' );
UPDATE t SET c4 = '7';

Then the slave will stop with:

Last_Errno	1032
Last_Error	Could not execute Update_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 448

How to repeat:
-- source include/master-slave.inc
-- source include/have_binlog_format_row.inc
-- connection master
-- source include/have_innodb.inc
-- connection slave
-- source include/have_innodb.inc

-- connection master
SET SQL_LOG_BIN=0;
CREATE TABLE t (c1 int, c2 char(1), c3 char(1), c4 char(1), primary key(c1,c3)) engine= InnoDB;
SET SQL_LOG_BIN=1;

-- connection slave
SET SQL_LOG_BIN=0;
CREATE TABLE t (c1 int, c2 char(1), c3 char(1), c4 char(1)) engine= InnoDB;
SET SQL_LOG_BIN=1;

-- connection master
INSERT INTO t VALUES (1, '1', '1', '1' );

-- sync_slave_with_master

-- connection master
UPDATE t SET c4 = '7';

-- sync_slave_with_master
-- connection master

DROP TABLE t;

-- sync_slave_with_master

exit;

Suggested fix:
n/a

Analysis
========

  - find_row

  When processing the before image (BI), from the row event, to
  fetch the correct record from the storage engine (SE), the
  slave first checks if there is a PK in its table. If there is,
  then the row is searched and fetched from the SE using the
  PK. find_row succeeds if the engine is able to find the record.

  However, when there is no PK, an index or table scan are
  performed. In such cases, there is the need to iterate over the
  records fetched from the SE and compare them against the one in
  the BI. If there is a match, the record is selected and the
  find_row succeeds.

  Record comparison is performed in record_compare function.

  - record_compare

  In this function, there are two optimizations, regarding the
  comparison of records:

    1. We can compare both records using cmp_record if all bits
       are set in the read_set of the event (meaning that the
       records should be both complete);

    2. Or we can compare using cmp_record if the engine provides
       partial column fetch - HA_PARTIAL_COLUMN_READ. This means
       that if the read_set on the event is honored by the SE
       while the fetching the row, both the BI and the row
       fetched should have the same fields.

  If none of the optimizations may be used (the read_set is not
  completely set or the SE does not provide partial row fetch),
  then columns from both rows are compared one by one against
  each other, provided that their bit is set in the read_set. If
  all columns signaled in the read_set match, then record_compare
  succeeds, otherwise it fails.

Problem Statement
=================
  
  There can be two cases that put at risk the usage of
  optimization #2 in record_compare:

   1. there can be the case, that the SE bypasses the read_set
      provided in the row event, returning the full row (despite
      the fact that it provides HA_PARTIAL_COLUMN_READ). This
      causes comparison to fail (partial BI will be compared
      against full row from SE);

   2. If the SE does honor the read_set, there can be the case
      that when preparing and unpacking the row event, default
      values are assigned to missing columns on the BI row,
      therefore BI will be filled with extra data. This can cause
      comparison to fail (BI row, filled also with default
      values, will be compared against SE row containing only
      requested columns).

  In the case presented in the howto repeat section, it seems
  that the engine is returning the full row while the table scan
  is performed on the slave (in find_row).

  Checking the SE implementation, I find the following remark on
  storage/innobase/handler/ha_innodb.cc:build_template :

    /* We always retrieve the whole clustered index record if we
     use exclusive row level locks, for example, if the read is
     done in an UPDATE statement. */

  Which seems to cause the SE to return the entire row for the
  test case presented, ultimately causing the test to fail.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/82907

2813 Luis Soares	2009-09-10
      BUG#47200: RBR: Absence of PK on slave leads to slave stop.
      
      If the same table is declared with a PK on the master but without
      the PK on the slave, row based replication may break. This is
      noticed for InnoDB engine, which provides HA_PARTIAL_COLUMN_READ,
      but it may also fail with other engines providing the same
      flag. The failure happens when searching the row on the slave to
      be updated.
      
      When the slave thread performs a scan (table or index), the
      search procedure is basically fetching rows from slave engine and
      comparing them against the row in the event before image. If the
      engine provides HA_PARTIAL_COLUMN_READ, then a memory comparison
      is performed for both rows, otherwise, a column by column (for
      only those marked in the read_set) comparison is done. There can
      be two problems with this:
      
      First, engine does not honor HA_PARTIAL_COLUMN_READ for some
      cases. When fetching the row from the slave engine, and although
      it provides HA_PARTIAL_COLUMN_READ, the table->read_set (which
      marks only the columns as in the master PK) is ignored. Thence,
      the engine returns the full row instead of only the columns
      needed for comparison. In this case, the comparison will fail.
      
      Second, when the RBR event row is unpacked it is filled with
      default values for the missing columns. So, it can be the case
      the event before image is filled with default values, on the
      slave at unpack time. Consequently, when searching the record,
      and even if the engine honors HA_PARTIAL_COLUMN_READ, the slave
      will fail comparison - it will do a memory comparison of row with
      PK + default values against a row with only the PK.
      
      This patch fixes this issue by removing the part of the
      comparison that takes into account the HA_PARTIAL_COLUMN_READ. In
      such cases it performs a column by column comparison (for those
      marked in the read_set).

Verified as described. Not repeatable with version 5.1

See also BUG#47303

Using the test case in the howto repeat section, I checked
against the following trees:

mysql-5.1-bugteam            => Test SUCCEEDS
mysql-next-mr                => Test SUCCEEDS
mysql-6.0-codebase-bugfixing => Test FAILS

So I confirm Sveta's comment ([14 Sep 12:17] Sveta Smirnova) and
extend the verification to next-mr tree.

Fixed in 5.6.2 by WL#5092, qv. for docs info. Closed.