Bug #27808 Infinite looping in circular replication
Submitted: 13 Apr 2007 13:02 Modified: 22 Oct 2008 6:44
Reporter: Lars Thalmann Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.1 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Triage: D2 (Serious)

[13 Apr 2007 13:02] Lars Thalmann
Description:
There are two cases when events can loop forever.

1. If a server fails in circular replication
   and the user fail-over the replication.

2. When using a cluster and an event is created at 
   a cluster server and the event is received by the same
   cluster later by another server.
   
Example for case 1
------------------
Consider the following scenario:

- Replication in circle of three servers: A->B->C->A.
- Server B fails
- User lets C replicate from A instead: A->C->A.

If, at the time B fails, there was an event 
generated by B which has arrived at A, 
but not yet received back to B then this event 
will loop forever in the circle A->C->A.

Example for case 2
------------------
Consider the following scenario:

- One cluster with MySQL servers A,B.
- One cluster with MySQL servers C,D.
- Replication A->C, D->B.

Any row changed at A, will be replicated A->C, D->B and at B 
it will be applied for the second time in the same cluster.

(This bug is weakly related to BUG#17095.)

How to repeat:
See scenarios above.

Suggested fix:
Introduce possibility to filter events from multiple masters 
on a slave:

  CHANGE MASTER SERVER_ID_FILTER=<list of server ids>;

Example:

  CHANGE MASTER SERVER_ID_FILTER=1,2,3;

The intension of this is that the slave will filter all events that
has originating server id either 1, 2, or 3.
[16 Apr 2007 21:37] Lars Thalmann
This is how one would issue the statement:

Case 1:
-------
When the server B fails in A->B->C->A, one would:

1. Wait for C to process its entire relay log.  Then as much info from
   B as possible have been received by C.

2. Execute on server C, CHANGE MASTER TO SERVER_ID_FILTER=B,C
   (where B,C are the numbers representing the servers)

3. Execute on server C, CHANGE MASTER TO MASTER_HOST=A
   Now we have a circle again, but smaller.

Case 2:
-------
- Replication A->C is set up by on C doing:
  1. CHANGE MASTER TO MASTER_HOST=A, SERVER_ID_FILTER=C,D
  2. START SLAVE

- Replication D->B is set up by on B doing:
  1. CHANGE MASTER TO MASTER_HOST=D, SERVER_ID_FILTER=A,B
  2. START SLAVE
[4 Sep 2007 12:27] Lars Thalmann
See also BUG#25998.
[5 May 2008 12:52] Andrei Elkin
The patch is on Bug #25998 page.
[16 Jul 2008 20:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49889

2717 Andrei Elkin	2008-07-16
      Bug #25998 problems about circle replication
      Bug #27808 Infinite looping in circular replication
      
      In case of withdrawing one of the servers from the circular multi-master replication group
      events generated by the removed server could become unstoppable (bug#25998).
      That's because the originator had been the terminator of the own event flow.
      
      Other possibility of the unstoppable event is the cluster replication (bug#27808).
      In that case an event generated by a member of a cluster was
      replicated to another member, got accepted and executed.
      By that same time effects of the event had been already propagated
      across the cluster via the cluster communications.
      In order to avoid double-applying, a replication event generated 
      by a co-member of the cluster should not be accepted.
      
      Both variations of the unstoppable replication event are fixable with 
      introducing a new option for CHANGE MASTER: 
      
      IGNORE_SERVER_IDS= (sid_1, sid_2, ... )
      
      The option can be set to the empty list that resets.
      
      Fixed with implementing the feature.
      
      Properties of the feature:
      
        a. reporting an error if the id of an ignored server is the slave itself and
        its configuration on startup was with --replicate-same-server-id;
        b. overriding the existing IGNORE_SERVER_IDS list by the following 
        CHANGE MASTER ... IGNORE_SERVER_IDS= (list), the empty list arg nullifies
        the current ignored list;
        c. preserving the existing list by CHANGE MASTER w/o IGNORE_SERVER_IDS;
        d. preserving the ignored server ids after RESET SLAVE;
        e. extending SHOW SLAVE STATUS with a new line listing ignored servers;
        f. a new line in master.info with the list of ignored servers;
        g. Differently from --replicate-same-server-id handling, the sql thread is not
        concerned with the ignored server ids, because it's supposed that
        the relay log consists only of events that can not be unstoppable.
        In order to guarantee that, e.g in case of the circular replication with a failing
        server DBA needs to change master necessarily using the new option.
        h. Rotate and FD events originated by the current master listed
        in the ignored list are still relay-logged which does not create
        any termination issue.
        i. The possible list of ignored servers is sorted for the fastest processing of filtering
        algorithm.
      
      Two new lines to show slave status output are added: the list of ignored servers and
      the current master server id (yet another feature for the user!).
      
      Use cases for this feature can be found on the bug report page.
[17 Jul 2008 19:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/49968

2673 Andrei Elkin	2008-07-17
      Bug #25998 problems about circle replication
      Bug #27808 Infinite looping in circular replication
      
      In case of withdrawing one of the servers from the circular multi-master replication group
      events generated by the removed server could become unstoppable (bug#25998).
      That's because the originator had been the terminator of the own event flow.
      
      Other possibility of the unstoppable event is the cluster replication (bug#27808).
      In that case an event generated by a member of a cluster was
      replicated to another member, got accepted and executed.
      By that same time effects of the event had been already propagated
      across the cluster via the cluster communications.
      In order to avoid double-applying, a replication event generated 
      by a co-member of the cluster should not be accepted.
      
      Both variations of the unstoppable replication event are fixable with 
      introducing a new option for CHANGE MASTER: 
      
      IGNORE_SERVER_IDS= (sid_1, sid_2, ... )
      
      The option can be set to the empty list that resets.
      
      Fixed with implementing the feature.
      
      Properties of the feature:
      
        a. reporting an error if the id of an ignored server is the slave itself and
        its configuration on startup was with --replicate-same-server-id;
        b. overriding the existing IGNORE_SERVER_IDS list by the following 
        CHANGE MASTER ... IGNORE_SERVER_IDS= (list), the empty list arg nullifies
        the current ignored list;
        c. preserving the existing list by CHANGE MASTER w/o IGNORE_SERVER_IDS;
        d. preserving the ignored server ids after RESET SLAVE;
        e. extending SHOW SLAVE STATUS with a new line listing ignored servers;
        f. a new line in master.info with the list of ignored servers;
        g. Differently from --replicate-same-server-id handling, the sql thread is not
        concerned with the ignored server ids, because it's supposed that
        the relay log consists only of events that can not be unstoppable.
        In order to guarantee that, e.g in case of the circular replication with a failing
        server DBA needs to change master necessarily using the new option.
        h. Rotate and FD events originated by the current master listed
        in the ignored list are still relay-logged which does not create
        any termination issue.
        i. The possible list of ignored servers is sorted for the fastest processing of filtering
        algorithm.
      
      Two new lines to show slave status output are added: the list of ignored servers and
      the current master server id (yet another feature for the user!).
      
      Use cases for this feature can be found on the bug report page.
[18 Jul 2008 7:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50006

2673 Andrei Elkin	2008-07-17
      Bug #25998 problems about circle replication
      Bug #27808 Infinite looping in circular replication
      
      In case of withdrawing one of the servers from the circular multi-master replication group
      events generated by the removed server could become unstoppable (bug#25998).
      That's because the originator had been the terminator of the own event flow.
      
      Other possibility of the unstoppable event is the cluster replication (bug#27808).
      In that case an event generated by a member of a cluster was
      replicated to another member, got accepted and executed.
      By that same time effects of the event had been already propagated
      across the cluster via the cluster communications.
      In order to avoid double-applying, a replication event generated 
      by a co-member of the cluster should not be accepted.
      
      Both variations of the unstoppable replication event are fixable with 
      introducing a new option for CHANGE MASTER: 
      
      IGNORE_SERVER_IDS= (sid_1, sid_2, ... )
      
      The option can be set to the empty list that resets.
      
      Fixed with implementing the feature.
      
      Properties of the feature:
      
        a. reporting an error if the id of an ignored server is the slave itself and
        its configuration on startup was with --replicate-same-server-id;
        b. overriding the existing IGNORE_SERVER_IDS list by the following 
        CHANGE MASTER ... IGNORE_SERVER_IDS= (list), the empty list arg nullifies
        the current ignored list;
        c. preserving the existing list by CHANGE MASTER w/o IGNORE_SERVER_IDS;
        d. preserving the ignored server ids after RESET SLAVE;
        e. extending SHOW SLAVE STATUS with a new line listing ignored servers;
        f. a new line in master.info with the list of ignored servers;
        g. Differently from --replicate-same-server-id handling, the sql thread is not
        concerned with the ignored server ids, because it's supposed that
        the relay log consists only of events that can not be unstoppable.
        In order to guarantee that, e.g in case of the circular replication with a failing
        server DBA needs to change master necessarily using the new option.
        h. Rotate and FD events originated by the current master listed
        in the ignored list are still relay-logged which does not create
        any termination issue.
        i. The possible list of ignored servers is sorted for the fastest processing of filtering
        algorithm.
      
      Two new lines to show slave status output are added: the list of ignored servers and
      the current master server id (yet another feature for the user!).
      
      Use cases for this feature can be found on the bug report page.
[22 Oct 2008 6:42] Lars Thalmann
Re-opening this bug.  A bug should only be set to "duplicate" 
if there is a reference to what bug it is duplicate to.
[22 Oct 2008 6:44] Lars Thalmann
Duplicate of BUG#25998.
[30 Jan 2009 13:27] Bugs System
Pushed into 6.0.10-alpha (revid:luis.soares@sun.com-20090129165607-wiskabxm948yx463) (version source revid:luis.soares@sun.com-20090129163120-e2ntks4wgpqde6zt) (merge vers: 6.0.10-alpha) (pib:6)