MySQL Bugs: #43794: Slave SQL thread aborts with RBR on non-transactional tables

Bug #43794	Slave SQL thread aborts with RBR on non-transactional tables
Submitted:	23 Mar 2009 1:05	Modified:	7 May 2009 10:13
Reporter:	Elena Stepanova	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S2 (Serious)
Version:	5.1.33, 6.0.10	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Two connections modifying a non-transactional table can make row-based replication break with the error:
[ERROR] Slave SQL: Could not execute Delete_rows event on table tst.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND;

Apparently, this is a result of the new behavior implemented as a bugfix for bug#40116 (not writing non-transactional statements ahead of the transaction).

The simplest use case does not involve any transactional tables at all, but a similar scenario (although without actual slave abort) had already been qualified as not a bug in bug#42816, on the reason that transactional context for non-transactional statements could have been avoided.  

However, the problem also affects the same data structure as had been initially described in bug#40116: a trigger on a transactional table writing into a non-transactional table. The scenario was considered then as common. The workaround suggested in bug#42816 -- moving non-transactional statements outside transaction -- is not quite suitable there.

Both test cases are described below.

How to repeat:
perl ./mysql-test-run.pl --mysqld=--binlog_format=row --do-test=rpl_failure_1

# t/rpl_failure_1.test
# Scenario 1 (non-transactional table only)

source include/master-slave.inc;

connection master;
--disable_warnings
drop database if exists tst;
--enable_warnings
create database tst;
use tst;
create table tst.t ( i int not null auto_increment, primary key (i), f1 int ) en
gine = MyISAM;
set autocommit = 0;
insert into tst.t (f1) values (1);

connection master1;
delete from tst.t;

connection slave;
select sleep(1);
show slave status;

# end of t/rpl_failure_1.test

perl ./mysql-test-run.pl --mysqld=--binlog_format=row --mysqld=--innodb --do-test=rpl_failure_2

# t/rpl_failure_2.test
# Scenario 2 (same structure as in bug#40116)

source include/master-slave.inc;

connection master;
--disable_warnings
drop database if exists tst;
--enable_warnings
create database tst;
use tst;
create table t ( f int ) engine = innodb;
create table log ( r int ) engine = myisam;
create trigger tr after insert on t for each row insert into log values ( new.f
);
set autocommit = 0;
insert into t values ( 1 );

connection master1;
delete from tst.log;

connection slave;
select sleep(1);
show slave status;

# end of t/rpl_failure_2.test

The first scenario places the non-transactional insert into a transactional context since autocommit=0. The workaround is to place an explicit COMMIT after the statement when autocommit=0, or use autocommit=1 and explicit transactions.

For the second scenario, the workaround is to not mix non-transactional and transactional engines in one statement. Some more information can be found at

    http://mysqlmusings.blogspot.com/2009/02/mixing-engines-in-transactions.html

With the current implementation of replication, we cannot write the non-transactional changes directly to the binary log. However, this is work to be done in WL#2687, which is not any any plan currently.