Bug #319 if while a non-transactional slave is replicating a transaction, possiblproblem
Submitted: 23 Apr 2003 14:32 Modified: 6 May 2009 14:24
Reporter: Guilhem Bichot Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:4.0, 4.1, 5.1 OS:Any (all)
Assigned to: Andrei Elkin
Triage: Triaged: D2 (Serious) / R4 (High) / E5 (Major)

[23 Apr 2003 14:32] Guilhem Bichot
Description:
Sorry for the synopsis, but the max length is too short for replication bugs ;)

If we have a transaction on the master:

begin;
#position 1000 in relay log
delete from innodbtable1;
#position 1020 in relay log
insert into innodbtable1 values(10);
commit;

Until commit is executed by the slave, relay-log.info will contain 1000.
If someone does STOP SLAVE between the 2 queries : later replication
will resume at position 1000 because this is what is written in relay-log.info.
- If the slave runs with InnoDB tables, rollback will occur, and replication
should later resume at position 1000. relay-log.info is good.
- If the slave runs with --skip-innodb, innodbtable1 is probably MyISAM on the slave, and no rollback, and replication should resume at 1020, not 1000 like it does. BUG.

Monty's proposed solution :
- when STOP SLAVE is received, if on the slave you have seen a BEGIN and you have updated a non transactional table, wait until the COMMIT is executed before stopping.
Use if (test_all_bits(thd->options,OPTION_BEGIN | OPTION_STATUS_NO_TRANS_UPDATE)) for this.
- so before starting executing a transaction on the slave, be sure that you
have the COMMIT in the relay log (otherwise you could wait forever).

Guilhem's comment : what if the slave stops because of an error (error in reading the relay log, or in applying the queries), not because of STOP SLAVE ?.
Then in the transaction some statements will be rolled back, others won't
(those updating a MyISAM table), and replication is badly broken. But it is
probably not a super idea to update a MyISAM table inside a transaction.

How to repeat:

Suggested fix:
Will work on it later. Not very urgent.
[27 Apr 2003 10:38] Michael Widenius
This is now documented in the manual.  We will fix this in a future MySQL release.
[30 Sep 2008 8:39] Konstantin Osipov
Time has come to verify that RBR works.
[16 Jan 2009 18:11] Susanne Ebrecht
Not sure if I really understood all right. Tested RBR on 5.1 bzr tree. Fact is stop/start position is the same after stop/start slave.

Situation is exactly how Guilhem described it also by using RBR and 5.1.
[2 Feb 2009 12:52] Susanne Ebrecht
Verified like described:

create table innodbtable1(i integer)engine=innodb;

begin;
delete from innodbtable1;

Now stop slave:

090202 13:47:06 [Note] Slave I/O thread killed while reading event
090202 13:47:06 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000001', position 408

insert into innodbtable1 values(10);
commit;

Now start slave again:
090202 13:47:44 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000001' at position 408, relay log './ernie-relay-bin.000002' position: 553
090202 13:47:44 [Note] Slave I/O thread: connected to master 'miracee@127.0.0.1:5151',replication started in log 'mysql-bin.000001' at position 408

So it started at same position as it stoped ...
[23 Feb 2009 11:37] Lars Thalmann
REFINED PROBLEM DESCRIPTION
===========================

ASSUMPTIONS

- Master has transactional tables (e.g. InnoDB)
- Slave has non-transactional tables (e.g. MyISAM)

PROBLEMS

1. When slave is stopped *by a user*, the transactional changes
   are not rolled back on the slave.  This causes the changes to
   be re-applied on the slave at later startup causing errors.

2. When slave is stopped *due to an error*, the transactional
   changes are not rolled back on the slave.  This causes the
   changes to be re-applied on the slave at later startup causing
   errors.

ANALYSIS
========

1. Can be solved by a delayed stop as Monty suggested.

2. Can not be solved in the general case without redesigning
   binlog to be interleaved and having more engine-influenced
   control over the transactions.
[26 Feb 2009 9:48] Susanne Ebrecht
Because we have two problems here I opened a separat bug report for the problem "When slave is stopped *by a user*"

New bug report is bug #43217
[26 Feb 2009 15:36] Susanne Ebrecht
This bug report is only for problem 2) when slave stop because of error.

Problem 1 is outsourced into bug #43217
[6 Mar 2009 21:38] Andrei Elkin
The bug contains request to stop processing a mixed transaction in statement base
replication in the middle. If there is a non-transaction table the
whole group will complete and only afterward STOP slave will become
effective.
[26 Mar 2009 8:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/70478

2836 Andrei Elkin	2009-03-26
      Bug#38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUP
      Bug#319  if while a non-transactional slave is replicating a transaction possible problem 
      
      It is impossible to roll back a mixed engines transaction when one of the engine is
      non-transaction. In replication that fact is crucial because the slave can not safely
      re-apply a transction that was interrupted with STOP SLAVE.
      
      Fixed with making STOP SLAVE not be effective immediately in the case the current
      group of replication events has modified a non-transaction table. In order for slave to leave
      either the group needs finishing or the user issues KILL QUERY|CONNECTION slave_thread_id.
[6 Apr 2009 12:41] Andrei Elkin
bug#38205 is being fixed with the same patch.
[9 Apr 2009 13:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71784

2837 Andrei Elkin	2009-04-09
      Bug #38205 Row-based Replication (RBR) causes inconsistencies: HA_ERR_FOUND_DUPP_KEY
      Bug#319  if while a non-transactional slave is replicating a transaction possible problem
      
      only testing related: addressing reviewers' comments.
[21 Apr 2009 9:52] Andrei Elkin
pushed to 5.1+ - bugteam
[5 May 2009 19:37] Bugs System
Pushed into 5.1.35 (revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (version source revid:davi.arnaut@sun.com-20090505190206-9xmh7dlc6kom8exp) (merge vers: 5.1.35) (pib:6)
[6 May 2009 9:45] Jon Stephens
Documented bugfix in the 5.1.35 changelog as follows:

        The transactional behavior of STOP SLAVE has changed. Formerly, 
        it took effect immediately; now, it waits until the current 
        replication event group (if any) has finished executing, or until 
        the user issues a KILL QUERY or KILL CONNECTION statement.

        This was done in order to solve the problem encountered when 
        replication was stopped while a non-transactional slave was 
        replicating a transaction on the master. (It was impossible to 
        roll back a mixed-engines transaction when one of the engines was 
        non-transactional, which meant that the slave could not safely 
        re-apply any transaction that had been interrupted by STOP 
        SLAVE.)

Set status = NDI: waiting on merge to 6.0 tree for version info.
[6 May 2009 14:08] Bugs System
Pushed into 6.0.12-alpha (revid:svoj@sun.com-20090506125450-yokcmvqf2g7jhujq) (version source revid:aelkin@mysql.com-20090421094803-hho7a34gdsby7382) (merge vers: 6.0.11-alpha) (pib:6)
[6 May 2009 14:24] Jon Stephens
Bugfix also noted in 6.0.12 changelog; closed.
[15 Jun 2009 8:24] Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090615074202-0r5r2jmi83tww6sf) (version source revid:jonas@mysql.com-20090615070837-9pccutgc7repvb4d) (merge vers: 5.1.35-ndb-6.3.26) (pib:6)
[15 Jun 2009 9:04] Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090615074335-9hcltksp5cu5fucn) (version source revid:jonas@mysql.com-20090615072714-rmfkvrbbipd9r32c) (merge vers: 5.1.35-ndb-7.0.7) (pib:6)
[15 Jun 2009 9:44] Bugs System
Pushed into 5.1.35-ndb-6.2.19 (revid:jonas@mysql.com-20090615061520-sq7ds4yw299ggugm) (version source revid:jonas@mysql.com-20090615054654-ebgpz7elwu1xj36j) (merge vers: 5.1.35-ndb-6.2.19) (pib:6)
[23 Jul 2009 10:24] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090723102221-ps4uaphwbxzj8p0q) (version source revid:aelkin@mysql.com-20090708180952-73h6cpmg3r5wtxwa) (merge vers: 5.4.4-alpha) (pib:11)
[12 Aug 2009 21:56] Paul Dubois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[12 Aug 2009 21:58] Paul Dubois
Noted in 5.4.2 changelog because next 5.4 version will be 5.4.2 and not 5.4.4.
[14 Aug 2009 22:48] Paul Dubois
Ignore previous comment about 5.4.2.
[7 Oct 2009 1:32] Paul Dubois
The 5.4 fix now has been pushed into 5.4.2.