Bug #89195 Total ordering of transactions is not respected in Group Replication.
Submitted: 11 Jan 2018 19:15 Modified: 25 Jul 14:45
Reporter: Jean-François Gagné Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:5.7.20, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any

[11 Jan 2018 19:15] Jean-François Gagné
Description:
Hi,

in the manual ([1] and [2]) we can read:

"the majority of the group have to agree on the order of a given transaction in the global sequence"

and:

"all servers apply the same set of changes in the same order"

[1]: https://dev.mysql.com/doc/refman/5.7/en/group-replication-background.html

[2]: https://dev.mysql.com/doc/refman/5.7/en/group-replication-summary.html

However, in Bug#89194, I reported a situation where transactions are not executed in the same order on different members of the group:

- On the member running the while, “B” is INSERTED before “b”,

- and on the member running the ALTER, “b” is INSERTED before “B”.

And above happens even if  the “b” INSERT is certified before the “B”.

I do not know if this is a documentation problem or a problem in the implementation of Group Replication.  The solution is either to fix the documentation or to hold COMMIT on the local node until the local applier reaches a marker for the current transaction in the relay logs of Group Replication (maybe that marker should include a complete RBR view of the transaction, see replication breakage below).

Also, as because the wait that I am suggesting might degrade performance, maybe the current behavior (no global order in transaction commit) is suitable in some cases.  Then, a configuration parameter could disable the wait that I am suggesting above, but waiting should be the default behavior.

Note that if the wait that I am suggesting would have been implemented, fixing the breakage in the group from Bug#89194 would be much easier as there would not be any data inconsistency (only needing to skip a transaction).  In the case of the node that issued the “B” INSERT, the session blocked in COMMIT would roll-back and in case the transaction needs to be replayed, it would be replayed by the Group Replication applier (this is why I wrote above that the maker should include a complete RBR view of the transaction).

Many thanks for looking into that,

JFG

How to repeat:
See Bug#89194.

Suggested fix:
I do not know if this is a documentation problem or a problem in the implementation of Group Replication.  The solution is either to fix the documentation or to hold COMMIT on the local node until the local applier reaches a marker for the current transaction in the relay logs of Group Replication.

Also, as because the wait that I am suggesting might degrade performance, maybe the current behavior (no global order in transaction commit) is suitable in some cases.  Then, a configuration parameter could disable the wait that I am suggesting above, but this should not be the default behavior.
[15 Jan 2018 12:40] Umesh Shastry
Hello Jean,

Thank you for the report and detailed steps.

Thanks,
Umesh
[15 Jan 2018 12:40] Umesh Shastry
Taken from Bug #89194

Attachment: 89194_5.7.20.results (application/octet-stream, text), 23.38 KiB.

[15 Jan 2018 18:12] Nuno Carvalho
Posted by developer:
 
Hi Jean-François,

Thank you for your detailed analysis of Group Replication.

Group Replication ensures that all servers receive and certify the same
set of transactions in the same order, from that point on, on
multi-primary mode, the apply of transactions may not respect the
certification order if and only if that does not break consistency[1].

From that moment onwards, a local transaction commit may be released as
soon as the transaction is certified. Remote transactions need to be
applied. This may lead to transactions being *externalized* in a slight
different order.

On single primary mode, there is a small chance that concurrent and
non-contending local transactions are committed and externalized in
a different order than that set by PAXOS. This is not problematic,
since such execution histories are still consistent and valid. >
Secondaries will commit in the same order, given that they observe
the total order defined by PAXOS because they run with
slave_preserve_commit_order set.

Although this does not break consistency, it may lead to a slightly
different, but valid, externalization order for a set of concurrent
transactions committing together on the primary and eventually applied
to the secondaries.

We will update the documentation with these low level details. Thanks for your interest on this subject.

[1] Unless there is a bug, and you did found one:
   BUG#89194: Wrong certification lead to data inconsistency and GR
              breakage
which is a duplicate of
   BUG#86078: Bad Write Set tracking with UNIQUE KEY on a DELETE followed
   by an INSERT
On your example, certification is failing to detect a conflict and that
breaks the consistency.

Best regards,
Nuno Carvalho
[25 Jul 14:45] Margaret Fisher
Posted by developer:
 
Thanks for raising this! Sorry it didn't get handled earlier. I've added the following explanation to
https://dev.mysql.com/doc/refman/5.7/en/group-replication-summary.html
instead of the sentence you quoted about applying the transactions in the same order:

For applying and externalizing the certified transactions, Group Replication permits servers to deviate from the agreed order of the transactions if this does not break consistency and validity. Group Replication is an eventual consistency system, meaning that as soon as the incoming traffic slows down or stops, all group members have the same data content. While traffic is flowing, transactions can be externalized in a slightly different order, or externalized on some members before the others. For example, in multi-primary mode, a local transaction might be externalized immediately following certification, although a remote transaction that is earlier in the global order has not yet been applied. This is permitted when the certification process has established that there is no conflict between the transactions. In single-primary mode, on the primary server, there is a small chance that concurrent, non-conflicting local transactions might be committed and externalized in a different order from the global order agreed by Group Replication. On the secondaries, which do not accept writes from clients, transactions are always committed and externalized in the agreed order.