MySQL Bugs: #52473: NDB inserts row when updating slave's empty table (with PK)

Bug #52473	NDB inserts row when updating slave's empty table (with PK)
Submitted:	30 Mar 2010 12:51	Modified:	24 Apr 2010 11:14
Reporter:	Luis Soares	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Replication	Severity:	S3 (Non-critical)
Version:	mysql-5.1-telco-6.3	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	telco-6.3

Description:
If using NDB storage engine and a table with a PK sometimes when
updating the slave, spurious rows may be observed. Take the
following example:

MASTER> CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(b)) ENGINE=NDB;
sync_slave_with_master
MASTER> INSERT INTO t1 VALUES (1,1);
sync_slave_with_master
SLAVE> DELETE FROM t1;
MASTER> UPDATE t1 SET a=2;
MASTER> SELECT * FROM t1;
(2,1)
SLAVE> SELECT * FROM t1;
(2,1) <====== Why is there a row inserted when we updated an
              empty table at the slave?

If verifying against InnoDB I get the following (expected) result:

Last_SQL_Errno  1032
Last_SQL_Error  Could not execute Update_rows event on table test.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 563

Furthermore, if still using InnoDB and setting slave to
IDEMPOTENT mode, I get no rows at the slave and no error 
(also expected), which is clearly different from what 
happens when using NDB:

CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(b)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,1);
SET @@global.slave_exec_mode= IDEMPOTENT;
DELETE FROM t1;
SELECT * FROM t1;
a       b
UPDATE t1 SET a=2 WHERE b=1;
SELECT * FROM t1;
a       b
2       1
SELECT * FROM t1;
a       b

I am aware that NDB does an optimization for not having to send
extra bytes when not needed (in the BI when PK is available) by
shipping updates as INSERTs and the slave behaving as in
idempotent mode. However, there are some corner cases, like the
one presented above, that get hurt.

Some extra details... I found this while checking the 
rpl_ndb_idempotent test case, where it has something like:

(...)

#
# Test that we can handle update of a row that does not exist on the slave
# will trigger usage of AO_IgnoreError on slave side so that the INSERT
# still succeeds even if the replication of the UPDATE generates an error.
#
--connection master
CREATE TABLE t1 (c1 CHAR(15) NOT NULL, c2 CHAR(15) NOT NULL, c3 INT NOT NULL, PRIMARY KEY (c3)) ENGINE = NDB ;
INSERT INTO t1 VALUES ("row1","remove on slave",1);

--sync_slave_with_master
--connection slave
DELETE FROM t1;

--connection master
BEGIN;
UPDATE t1 SET c2="does not exist" WHERE c3=1;
INSERT INTO t1 VALUES ("row2","new on slave",2);
COMMIT;

In this case, the error that happens at the slave is not because
the row does not exist, but because the INSERT correspondent to
the UPDATE is actually trying to squeeze a NULL value in a field
that has a NOT NULL constraint (c1) (the AI on the
Write_rows_event does not contain value for c1).

If we change the update clause from:
  UPDATE t1 SET c2="does not exist" WHERE c3=1;
to
  UPDATE t1 SET c1="row1", c2="does not exist" WHERE c3=1;
the test case will fail with result difference (one unexpected
row at the slave).

How to repeat:
1. bzr clone -r revid:jorgen.austvik@sun.com-20100330064123-pvk4u2s0h32m99by $BZRREPO/mysql-5.1-telco-6.3 
2. cd mysql-5.1-telco-6.3
3. bash BUILD/compile-amd64-debug-max
4. cd mysql-test
5. edit suite/rpl_ndb/t/rpl_ndb_bug_idempotent.test

--source include/have_ndb.inc
--source include/have_binlog_format_mixed_or_row.inc
--source include/ndb_master-slave.inc

CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(b)) ENGINE=NDB;
INSERT INTO t1 VALUES (1,1);
-- sync_slave_with_master
DELETE FROM t1;
SELECT * FROM t1;
-- connection master
UPDATE t1 SET a=2 WHERE b=1;
SELECT * FROM t1;
-- sync_slave_with_master
SELECT * FROM t1;

-- exit

6. perl mysql-test-run.pl --mysqld=--binlog-format=row rpl_ndb_bug_idempotent
===> NOTICE THE ROW AT THE SLAVE

Suggested fix:
.

Thank you for the report.

Verified as described with mysql-5.1-telco-7.1 tree.

See also: BUG#48215.

Probably the testcase is confused or the recorded result is confused.

I would expect that :
  When ndb-log-update-as-write = 0
    An Update will be logged as UPDATE_ROW
    An Insert will be logged as WRITE_ROW
  For the error on the update (row doesn't exist) to be skipped, special error handling will be required on the slave (as described in the comment excerpt listed here).  I suspect that this is what the testcase was intended to initially test.

  When ndb-log-update-as-write = 1
    An Update will be logged as WRITE_ROW
    An Insert will be logged as WRITE_ROW

  On the slave, the Ndb table handler maps WRITE_ROW onto an NdbApi 'write operation'.  This operation is a kind of 'insert on duplicate update' operation.  When the write operation is used, it will 'silently' turn the Update into an Insert.  In this bug, it seems that the insert then fails as it is missing an essential column.  If it has the column info then it succeeds.

  So it seems like the testcase needs to separate out testing of IDEMPOTENT/STRICT mode between scenarios with log-slave-updates on and off.

Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101008132832-pbzewvmi9f365ak4) (version source revid:alexander.nozdrin@oracle.com-20101008132832-pbzewvmi9f365ak4) (pib:21)

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)