Bug #42125 Replication inifinite loop for queries
Submitted: 15 Jan 2009 7:04 Modified: 15 Feb 2009 7:47
Reporter: Shlomi Noach (OCA) Email Updates:
Status: No Feedback Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:All OS:Any
Assigned to: CPU Architecture:Any
Tags: infinite loop, log-slave-updates, replication

[15 Jan 2009 7:04] Shlomi Noach
In a MySQL replication, a message is only discarded by the server which originated it.
In a master-master replication, where masters use log-slave-updates, a query can be re-executed infinitely by participating machines, if the originating machine is not in the loop.

How to repeat:
Setup the following master-slave configuration:
(1) A -> B
A is master, B is slave. Add node C:
(2) A -> B  -> C
All nodes use log-slave updates.
Now setup master-master replication between B,C:
(3) A    B <-> C
(By issuing CHANGE MASTER TO MASTER_HOST='C'... on node B)

Any query logged on A during phase (2), will now run infinitely between B and C.
During phase (2) An UPDATE is performed on A. It is replicated by and executed on B. Since log-slave-updates is used, it is replicated by and executed by C. It stays there.

When we tell B to replicate C, it will read this UPDATE, and will re-execute it, write it to binary log, to be replicated and re-executed by C, and so on.

Suggested fix:
Today a message is consumed only by the originating machine.
Each node should keep a small map: the last known message (or query position in the binary log) - *per server id*.

Thus, a message should be discarded if it came from server id X, and its position is equal or is lower than the last known position fro server X.

If the message is of higher position, it is executed, logged, and the counter for that server is updated.
[15 Jan 2009 7:47] Sveta Smirnova
Thank you for the report.

Please indicate accurate version of MySQL server you use and provide configuration files for all 3 servers.
[16 Feb 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".