Bug #89195 Total ordering of transactions is not respected in Group Replication.
Submitted: 11 Jan 19:15 Modified: 15 Jan 12:40
Reporter: Jean-François Gagné Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:5.7.20, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any

[11 Jan 19:15] Jean-François Gagné
Description:
Hi,

in the manual ([1] and [2]) we can read:

"the majority of the group have to agree on the order of a given transaction in the global sequence"

and:

"all servers apply the same set of changes in the same order"

[1]: https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html

[2]: https://dev.mysql.com/doc/refman/5.7/en/group-replication-summary.html

However, in Bug#89194, I reported a situation where transactions are not executed in the same order on different members of the group:

- On the member running the while, “B” is INSERTED before “b”,

- and on the member running the ALTER, “b” is INSERTED before “B”.

And above happens even if  the “b” INSERT is certified before the “B”.

I do not know if this is a documentation problem or a problem in the implementation of Group Replication.  The solution is either to fix the documentation or to hold COMMIT on the local node until the local applier reaches a marker for the current transaction in the relay logs of Group Replication (maybe that marker should include a complete RBR view of the transaction, see replication breakage below).

Also, as because the wait that I am suggesting might degrade performance, maybe the current behavior (no global order in transaction commit) is suitable in some cases.  Then, a configuration parameter could disable the wait that I am suggesting above, but waiting should be the default behavior.

Note that if the wait that I am suggesting would have been implemented, fixing the breakage in the group from Bug#89194 would be much easier as there would not be any data inconsistency (only needing to skip a transaction).  In the case of the node that issued the “B” INSERT, the session blocked in COMMIT would roll-back and in case the transaction needs to be replayed, it would be replayed by the Group Replication applier (this is why I wrote above that the maker should include a complete RBR view of the transaction).

Many thanks for looking into that,

JFG

How to repeat:
See Bug#89194.

Suggested fix:
I do not know if this is a documentation problem or a problem in the implementation of Group Replication.  The solution is either to fix the documentation or to hold COMMIT on the local node until the local applier reaches a marker for the current transaction in the relay logs of Group Replication.

Also, as because the wait that I am suggesting might degrade performance, maybe the current behavior (no global order in transaction commit) is suitable in some cases.  Then, a configuration parameter could disable the wait that I am suggesting above, but this should not be the default behavior.
[15 Jan 12:40] Umesh Shastry
Hello Jean,

Thank you for the report and detailed steps.

Thanks,
Umesh
[15 Jan 12:40] Umesh Shastry
Taken from Bug #89194

Attachment: 89194_5.7.20.results (application/octet-stream, text), 23.38 KiB.

[15 Jan 18:12] Nuno Carvalho
Posted by developer:
 
Hi Jean-François,

Thank you for your detailed analysis of Group Replication.

Group Replication ensures that all servers receive and certify the same
set of transactions in the same order, from that point on, on
multi-primary mode, the apply of transactions may not respect the
certification order if and only if that does not break consistency[1].

From that moment onwards, a local transaction commit may be released as
soon as the transaction is certified. Remote transactions need to be
applied. This may lead to transactions being *externalized* in a slight
different order.

On single primary mode, there is a small chance that concurrent and
non-contending local transactions are committed and externalized in
a different order than that set by PAXOS. This is not problematic,
since such execution histories are still consistent and valid. >
Secondaries will commit in the same order, given that they observe
the total order defined by PAXOS because they run with
slave_preserve_commit_order set.

Although this does not break consistency, it may lead to a slightly
different, but valid, externalization order for a set of concurrent
transactions committing together on the primary and eventually applied
to the secondaries.

We will update the documentation with these low level details. Thanks for your interest on this subject.

[1] Unless there is a bug, and you did found one:
   BUG#89194: Wrong certification lead to data inconsistency and GR
              breakage
which is a duplicate of
   BUG#86078: Bad Write Set tracking with UNIQUE KEY on a DELETE followed
   by an INSERT
On your example, certification is failing to detect a conflict and that
breaks the consistency.

Best regards,
Nuno Carvalho