Bug #47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
Submitted: 12 Sep 2009 21:06 Modified: 19 Dec 2009 11:22
Reporter: Elena Stepanova Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.38, 5,1,39 OS:Any
Assigned to: Alfranio Correia CPU Architecture:Any
Tags: regression
Triage: Triaged: D2 (Serious) / R1 (None/Negligible) / E1 (None/Negligible)

[12 Sep 2009 21:06] Elena Stepanova
Description:
There was a recent regression (between 5.1.37 and 5.1.38) described in bug#46864: if a transaction contains a multi-row DML statement which involves both transactional and non-transactional engines, the statement fails, and the transaction is followed by a COMMIT, master and slave can end up with different table contents: the statement is correctly rolled back on master, but is applied on slave.

Similarly, if the binlog is used for recovery rather than for replication, the restored data also differs from the original contents.

The bug#46864 is about to be closed in 5.1.39, but it only fixes the problem with SBR, while it still exists for RBR (and MBR if the statement is such that it makes server switch to RBR).

The test case provided in 'how to repeat' section is the same as described in bug#46864, I only modified it so that it fails for both RBR and MBR, eliminated direct modification of auto-increment PK values (which is often said to be a bad thing to do), and made the key values in update more obvious. 

How to repeat:
# t/rpl_failure_rbr_mbr.test

source include/have_innodb.inc;
source include/master-slave.inc;

--disable_query_log
--disable_result_log

connection master;

--disable_warnings
drop database if exists tst;
--enable_warnings
create database tst;
use tst;

create table tb1(
  i1 int NOT NULL auto_increment, primary key (i1),
  i2 int unique key,
  f1 char (15)
) engine=innodb;

insert into tb1 ( i2, f1 ) values
(2,'init'),
(3,'init'),
(4,'init');

create table log (
  i1 int NOT NULL auto_increment, primary key (i1),
  entry_dsc char(100)
) engine=myisam;

delimiter //;
Create trigger tb1_upd after update on tb1 for each row
begin
  insert into log (entry_dsc)
  values (concat('Update row ', old.i1,' ', old.f1, '->', new.f1));
end//
delimiter ;//

set autocommit = 0;

--disable_abort_on_error
UPDATE tb1
SET i2 = i2-1, f1 = 'UPDATE' WHERE i2 in (2,4);
--enable_abort_on_error

commit;

--enable_query_log
--enable_result_log

--echo # On master:
select * from tb1;

sync_slave_with_master;

connection slave;
use tst;

--echo # On slave:
select * from tb1;

# end of t/rpl_failure_rbr_mbr.test

#----------------------------

Run the test above with binlog_format=row or mixed, e.g :

perl ./mysql-test-run.pl  --mysqld=--binlog_format=mixed rpl_failure_mbr_rbr

The test will produce the following output:

+# On master:
+select * from tb1;
+i1     i2      f1
+1      2       init
+2      3       init
+3      4       init
+use tst;
+# On slave:
+select * from tb1;
+i1     i2      f1
+1      1       UPDATE
+2      3       init
+3      4       init

Suggested fix:
There is an ongoing discussion which infers that the problem might never be fixed in 5.1. If such decision is taken, I suggest declaring in the documentation that a mix of txn- and non-txn engines in one statement is considered unsafe with any binlog format. 
As for now, in the scope of bug#28976 we are going to issue the following text in 5.1.39 changelog: 

(quote)

in statement-based logging mode ...

        3. A statement affecting both transactional and 
        non-transactional tables within a transaction is always
        considered unsafe. 

(end of quote)

Thus, we indirectly imply that in mixed- or row-based logging mode such statements are NOT considered unsafe.

Now the manual has a warning that such a mix is not advised in replication context; however, as said in the description, the problem affects not only replication, but also recovery flow.
[17 Sep 2009 15:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/83622

3111 Alfranio Correia	2009-09-17
      BUG#47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
      
      Let
        - T be a transactional table and N non-transactional table.
        - B be begin, C commit and R rollback.
        - M be a mixed statement, i.e. a statement that updates both T and N.
        - M* be a mixed statement that fails while updating either T or N.
      
      This patch restore the behavior presented in 5.1.37 for rows either produced in
      the RBR or MIXED modes, when a M* statement that happened early in a transaction
      had their changes written to the binary log outside the boundaries of the
      transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
      consistent with with the master as the rollback would keep the changes on N and
      undo them on T. In particular, we do what folllows:
      
        . B M* T C would log - B M* R B T C.
      
      Note that, we are not preserving history from the master as we are introducing a
      rollback that never happened. However, this seems to be more acceptable than
      making the slave diverge. We do not fix the following case:
      
        . B T M* C would log B T M* C.
      
      The slave will diverge as the changes on T tables that originated from the M
      statement are rolled back on the master but not on the slave. Unfortunately, we
      cannot simply rollback the transaction as this would undo any uncommitted
      changes on T tables.
      
      SBR is not considered in this patch because a failing statement is written to
      the binary along with the error code and a slave executes and then rolls back
      the statement when it has an associated error code, thus undoing the effects
      on T. In RBR and MBR, a full-fledge fix will be pushed after the WL 2687.
[22 Sep 2009 6:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84036

3117 Alfranio Correia	2009-09-22
      BUG#47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
      
      Let
        - T be a transactional table and N non-transactional table.
        - B be begin, C commit and R rollback.
        - M be a mixed statement, i.e. a statement that updates both T and N.
        - M* be a mixed statement that fails while updating either T or N.
      
      This patch restore the behavior presented in 5.1.37 for rows either produced in
      the RBR or MIXED modes, when a M* statement that happened early in a transaction
      had their changes written to the binary log outside the boundaries of the
      transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
      consistent with with the master as the rollback would keep the changes on N and
      undo them on T. In particular, we do what follows:
      
        . B M* T C would log - B M* R B T C.
      
      Note that, we are not preserving history from the master as we are introducing a
      rollback that never happened. However, this seems to be more acceptable than
      making the slave diverge. We do not fix the following case:
      
        . B T M* C would log B T M* C.
      
      The slave will diverge as the changes on T tables that originated from the M
      statement are rolled back on the master but not on the slave. Unfortunately, we
      cannot simply rollback the transaction as this would undo any uncommitted
      changes on T tables.
      
      SBR is not considered in this patch because a failing statement is written to
      the binary along with the error code and a slave executes and then rolls back
      the statement when it has an associated error code, thus undoing the effects
      on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
[23 Sep 2009 2:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84230

3117 Alfranio Correia	2009-09-23
      BUG#47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
      
      Let
        - T be a transactional table and N non-transactional table.
        - B be begin, C commit and R rollback.
        - M be a mixed statement, i.e. a statement that updates both T and N.
        - M* be a mixed statement that fails while updating either T or N.
      
      This patch restore the behavior presented in 5.1.37 for rows either produced in
      the RBR or MIXED modes, when a M* statement that happened early in a transaction
      had their changes written to the binary log outside the boundaries of the
      transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
      consistent with with the master as the rollback would keep the changes on N and
      undo them on T. In particular, we do what follows:
      
        . B M* T C would log - B M* R B T C.
      
      Note that, we are not preserving history from the master as we are introducing a
      rollback that never happened. However, this seems to be more acceptable than
      making the slave diverge. We do not fix the following case:
      
        . B T M* C would log B T M* C.
      
      The slave will diverge as the changes on T tables that originated from the M
      statement are rolled back on the master but not on the slave. Unfortunately, we
      cannot simply rollback the transaction as this would undo any uncommitted
      changes on T tables.
      
      SBR is not considered in this patch because a failing statement is written to
      the binary along with the error code and a slave executes and then rolls back
      the statement when it has an associated error code, thus undoing the effects
      on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
[23 Sep 2009 2:49] Alfranio Correia
See also BUG#47175.
[28 Sep 2009 12:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84850

3139 Alfranio Correia	2009-09-28
      BUG#47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
      
      Let
        - T be a transactional table and N non-transactional table.
        - B be begin, C commit and R rollback.
        - M be a mixed statement, i.e. a statement that updates both T and N.
        - M* be a mixed statement that fails while updating either T or N.
      
      This patch restore the behavior presented in 5.1.37 for rows either produced in
      the RBR or MIXED modes, when a M* statement that happened early in a transaction
      had their changes written to the binary log outside the boundaries of the
      transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
      consistent with with the master as the rollback would keep the changes on N and
      undo them on T. In particular, we do what follows:
      
        . B M* T C would log - B M* R B T C.
      
      Note that, we are not preserving history from the master as we are introducing a
      rollback that never happened. However, this seems to be more acceptable than
      making the slave diverge. We do not fix the following case:
      
        . B T M* C would log B T M* C.
      
      The slave will diverge as the changes on T tables that originated from the M
      statement are rolled back on the master but not on the slave. Unfortunately, we
      cannot simply rollback the transaction as this would undo any uncommitted
      changes on T tables.
      
      SBR is not considered in this patch because a failing statement is written to
      the binary along with the error code and a slave executes and then rolls back
      the statement when it has an associated error code, thus undoing the effects
      on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
[6 Oct 2009 0:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/85809

3139 Alfranio Correia	2009-10-06
      BUG#47287 RBR: replication diff on basic case with txn- and non-txn tables in a statement
      
      Let
        - T be a transactional table and N non-transactional table.
        - B be begin, C commit and R rollback.
        - M be a mixed statement, i.e. a statement that updates both T and N.
        - M* be a mixed statement that fails while updating either T or N.
      
      This patch restore the behavior presented in 5.1.37 for rows either produced in
      the RBR or MIXED modes, when a M* statement that happened early in a transaction
      had their changes written to the binary log outside the boundaries of the
      transaction and wrapped in a BEGIN/ROLLBACK. This was done to keep the slave
      consistent with with the master as the rollback would keep the changes on N and
      undo them on T. In particular, we do what follows:
      
        . B M* T C would log - B M* R B T C.
      
      Note that, we are not preserving history from the master as we are introducing a
      rollback that never happened. However, this seems to be more acceptable than
      making the slave diverge. We do not fix the following case:
      
        . B T M* C would log B T M* C.
      
      The slave will diverge as the changes on T tables that originated from the M
      statement are rolled back on the master but not on the slave. Unfortunately, we
      cannot simply rollback the transaction as this would undo any uncommitted
      changes on T tables.
      
      SBR is not considered in this patch because a failing statement is written to
      the binary along with the error code and a slave executes and then rolls back
      the statement when it has an associated error code, thus undoing the effects
      on T. In RBR and MBR, a full-fledged fix will be pushed after the WL 2687.
[7 Oct 2009 17:59] Alfranio Correia
Pushed to mysql-5.1-bugteam and mysql-pe.
[21 Oct 2009 19:21] Magnus BlÄudd
Patch for this bug causes compiler warnings since label end: is no longer used. Please fix.

-    goto end;

sql/log.cc: 1552
	label 'end' defined but not used
libmysqld/log.cc: 1552
	label 'end' defined but not used
[22 Oct 2009 6:33] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:05] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091015131113-wldlkdtoe9oebb2z) (merge vers: 5.5.0-beta) (pib:13)
[22 Oct 2009 10:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87757

3189 Alfranio Correia	2009-10-22
      Post-fix for BUG#47287.
      
      The label "end" was causing compiler warnings as it was no longer used.
      To fix the problem we removed it.
[4 Nov 2009 9:24] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091104092152-qz96bzlf2o1japwc) (version source revid:kristofer.pettersson@sun.com-20091103162305-08l4gkeuif2ozsoj) (merge vers: 5.1.41) (pib:13)
[10 Nov 2009 12:01] Jon Stephens
Documented bugfix in the 5.1.41 changelog as follows:

        When using the row-based format to replicate a transaction
        involving both transactional and non-transactional engines,
        which contained a DML statement affecting multiple rows, the
        statement failed; if this transaction was followed by a COMMIT,
        the master and the slave could diverge, because the statement
        was correctly rolled back on the master, but was applied on the
        slave.

NOTE: Additional documentation work needs to be done with regard to this and related bugs/issues, but we need the changelog entry for the impending 5.1.41 release. Paul or I will take care of these additional requirements ASAP.
[11 Nov 2009 6:48] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091110093407-rw5g8dys2baqkt67) (version source revid:alik@sun.com-20091109080109-7dxapd5y5pxlu08w) (merge vers: 6.0.14-alpha) (pib:13)
[11 Nov 2009 6:56] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091109115615-nuohp02h8mdrz8m2) (version source revid:alik@sun.com-20091105092041-sp6eyod7sdlfuj3b) (merge vers: 5.5.0-beta) (pib:13)
[11 Nov 2009 14:58] Jon Stephens
Added changelog entry for 5.5.0 and 6.0.14. See my previous comment.
[18 Dec 2009 10:26] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:42] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 10:57] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:12] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[19 Dec 2009 11:22] Jon Stephens
No additional changelog entries needed. Setting back to Closed state.