Bug #319 if while a non-transactional slave is replicating a transaction, possiblproblem
Submitted: 23 Apr 2003 16:32 Modified: 6 May 16:24
Reporter: Guilhem Bichot
Status: Closed
Category:Server: Replication Severity:S3 (Non-critical)
Version:4.0, 4.1, 5.1 OS:Any (all)
Assigned to: Andrei Elkin Target Version:6.1+
Triage: Triaged: D2 (Serious) / R4 (High) / E5 (Major)

[23 Apr 2003 16:32] Guilhem Bichot
Description:
Sorry for the synopsis, but the max length is too short for replication bugs ;)

If we have a transaction on the master:

begin;
#position 1000 in relay log
delete from innodbtable1;
#position 1020 in relay log
insert into innodbtable1 values(10);
commit;

Until commit is executed by the slave, relay-log.info will contain 1000.
If someone does STOP SLAVE between the 2 queries : later replication
will resume at position 1000 because this is what is written in relay-log.info.
- If the slave runs with InnoDB tables, rollback will occur, and replication
should later resume at position 1000. relay-log.info is good.
- If the slave runs with --skip-innodb, innodbtable1 is probably MyISAM on the slave, and
no rollback, and replication should resume at 1020, not 1000 like it does. BUG.

Monty's proposed solution :
- when STOP SLAVE is received, if on the slave you have seen a BEGIN and you have updated
a non transactional table, wait until the COMMIT is executed before stopping.
Use if (test_all_bits(thd->options,OPTION_BEGIN | OPTION_STATUS_NO_TRANS_UPDATE)) for
this.
- so before starting executing a transaction on the slave, be sure that you
have the COMMIT in the relay log (otherwise you could wait forever).

Guilhem's comment : what if the slave stops because of an error (error in reading the
relay log, or in applying the queries), not because of STOP SLAVE ?.
Then in the transaction some statements will be rolled back, others won't
(those updating a MyISAM table), and replication is badly broken. But it is
probably not a super idea to update a MyISAM table inside a transaction.

How to repeat:

Suggested fix:
Will work on it later. Not very urgent.
[27 Apr 2003 12:38] Michael Widenius
This is now documented in the manual.  We will fix this in a future MySQL release.
[30 Sep 2008 10:39] Konstantin Osipov
Time has come to verify that RBR works.
[16 Jan 19:11] Susanne Ebrecht
Not sure if I really understood all right. Tested RBR on 5.1 bzr tree. Fact is stop/start
position is the same after stop/start slave.

Situation is exactly how Guilhem described it also by using RBR and 5.1.
[2 Feb 13:52] Susanne Ebrecht
Verified like described:

create table innodbtable1(i integer)engine=innodb;

begin;
delete from innodbtable1;

Now stop slave:

090202 13:47:06 [Note] Slave I/O thread killed while reading event
090202 13:47:06 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001',
position 408

insert into innodbtable1 values(10);
commit;

Now start slave again:
090202 13:47:44 [Note] Slave SQL thread initialized, starting replication in log
'mysql-bin.000001' at position 408, relay log './ernie-relay-bin.000002' position: 553
090202 13:47:44 [Note] Slave I/O thread: connected to master
'miracee@127.0.0.1:5151',replication started in log 'mysql-bin.000001' at position 408

So it started at same position as it stoped ...
[23 Feb 12:37] Lars Thalmann
REFINED PROBLEM DESCRIPTION
===========================

ASSUMPTIONS

- Master has transactional tables (e.g. InnoDB)
- Slave has non-transactional tables (e.g. MyISAM)

PROBLEMS

1. When slave is stopped *by a user*, the transactional changes
   are not rolled back on the slave.  This causes the changes to
   be re-applied on the slave at later startup causing errors.

2. When slave is stopped *due to an error*, the transactional
   changes are not rolled back on the slave.  This causes the
   changes to be re-applied on the slave at later startup causing
   errors.

ANALYSIS
========

1. Can be solved by a delayed stop as Monty suggested.

2. Can not be solved in the general case without redesigning
   binlog to be interleaved and having more engine-influenced
   control over the transactions.
[26 Feb 10:48] Susanne Ebrecht
Because we have two problems here I opened a separat bug report for the problem "When
slave is stopped *by a user*"

New bug report is bug #43217
[26 Feb 16:36] Susanne Ebrecht
This bug report is only for problem 2) when slave stop because of error.

Problem 1 is outsourced into bug #43217
[6 Mar 22:38] Andrei Elkin
The bug contains request to stop processing a mixed transaction in statement base
replication in the middle. If there is a non-transaction table the
whole group will complete and only afterward STOP slave will become
effective.
[26 Mar 9:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/70478

2836 Andrei Elkin	2009-03-26
      Bug#38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUP
      Bug#319  if while a non-transactional slave is replicating a transaction possible
problem 
      
      It is impossible to roll back a mixed engines transaction when one of the engine is
      non-transaction. In replication that fact is crucial because the slave can not
safely
      re-apply a transction that was interrupted with STOP SLAVE.
      
      Fixed with making STOP SLAVE not be effective immediately in the case the current
      group of replication events has modified a non-transaction table. In order for slave
to leave
      either the group needs finishing or the user issues KILL QUERY|CONNECTION
slave_thread_id.
[6 Apr 14:41] Andrei Elkin
bug#38205 is being fixed with the same patch.
[9 Apr 15:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71784

2837 Andrei Elkin	2009-04-09
      Bug #38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY
      Bug#319  if while a non-transactional slave is replicating a transaction possible
problem
      
      only testing related: addressing reviewers' comments.
[21 Apr 11:52] Andrei Elkin
pushed to 5.1+ - bugteam
[5 May 21:37] Bugs System
Pushed into 5.1.35 (revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (version
source revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (merge vers: 5.1.35)
(pib:6)
[6 May 11:45] Jon Stephens
Documented bugfix in the 5.1.35 changelog as follows:

        The transactional behavior of STOP SLAVE has changed. Formerly, 
        it took effect immediately; now, it waits until the current 
        replication event group (if any) has finished executing, or until 
        the user issues a KILL QUERY or KILL CONNECTION statement.

        This was done in order to solve the problem encountered when 
        replication was stopped while a non-transactional slave was 
        replicating a transaction on the master. (It was impossible to 
        roll back a mixed-engines transaction when one of the engines was 
        non-transactional, which meant that the slave could not safely 
        re-apply any transaction that had been interrupted by STOP 
        SLAVE.)

Set status = NDI: waiting on merge to 6.0 tree for version info.
[6 May 16:08] Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version
source revid:aelkin@mysql.com-20090421094803-hho7a34gdsby7382) (merge vers: 6.0.11-alpha)
(pib:6)
[6 May 16:24] Jon Stephens
Bugfix also noted in 6.0.12 changelog; closed.
[15 Jun 10:24] Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090615074202-0r5r2jmi83tww6sf)
(version source revid:jonas@mysql.com-20090615070837-9pccutgc7repvb4d) (merge vers:
5.1.35-ndb-6.3.26) (pib:6)
[15 Jun 11:04] Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090615074335-9hcltksp5cu5fucn)
(version source revid:jonas@mysql.com-20090615072714-rmfkvrbbipd9r32c) (merge vers:
5.1.35-ndb-7.0.7) (pib:6)
[15 Jun 11:44] Bugs System
Pushed into 5.1.35-ndb-6.2.19 (revid:jonas@mysql.com-20090615061520-sq7ds4yw299ggugm)
(version source revid:jonas@mysql.com-20090615054654-ebgpz7elwu1xj36j) (merge vers:
5.1.35-ndb-6.2.19) (pib:6)
[23 Jul 12:24] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090723102221-ps4uaphwbxzj8p0q) (version
source revid:aelkin@mysql.com-20090708180952-73h6cpmg3r5wtxwa) (merge vers: 5.4.4-alpha)
(pib:11)
[12 Aug 23:56] Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[12 Aug 23:58] Paul DuBois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[15 Aug 0:48] Paul DuBois
Ignore previous comment about 5.4.2.
[7 Oct 3:32] Paul DuBois
The 5.4 fix now has been pushed into 5.4.2.