Bug #88120 Backport BUG#26277771 to 5.7
Submitted: 17 Oct 2017 11:22 Modified: 22 Feb 2018 17:38
Reporter: Nuno Carvalho Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.17 OS:Any
Assigned to: CPU Architecture:Any

[17 Oct 2017 11:22] Nuno Carvalho
Description:
The writesets generated for keys were not using collation into
consideration, and because of that wrong last_committed and
sequence_number were getting added to binary log and thus
transactions were getting applied in wrong order by parallel applier
on slave.

This issue will be fixed on BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT on 8.0, which will
take care of:
 1) Does the writeset depend on the row physical layout?
    Things like padding, binary collation and so on.
    If so what challenges does this present to upgrades?

 2) The buffer used to hold the writeset is a string buffer, which
    is only considering the characters until the first '\0',
    ignoring the complete key value.
    We need to change the code to use a byte buffer.

 3) It looks like that on the pushed patch for this bug, which was
    later reverted, there was double string collation conversation.
    This needs to be validated.

Since this issue also affects Group Replication, we need to backport
it to 5.7, on which we need to deal with:
 4) Upgrade path, while a writeset format change between 5.7 and 8.0
    is not a problem, since when 8.0 members join a 5.7 group, the
    8.0 members are not allowed to do writes.
    Between minor versions on 5.7 it is a problem, we need a
    upgrade path to make this fix possible without or with minimal
    user intervention.

This bug will backport BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT to 5.7.

How to repeat:
Please see BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT

Suggested fix:
We could say that a 5.7 version with the fix would set read only
when mixed with versions without the patch, but that would be
intrusive and not streamline to the users. We or users would need to
check when read only must be to unset.
That would for sure collide with the non trivial primary elections
algorithms that we already have.

One alternative that we have to make this transparent for the users
is, on 5.7, always send the both versions of the hash, this would
mean that a row change will generate 2 instead of 1 writeset, and 4
instead of 2, when PK are involved.

With this in-place we can mix patched and unpatched versions:

CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin
NOT NULL PRIMARY KEY);

5.7 without patch
----------------
INSERT INTO t1 VALUES ('a');
  3333046699319245624

INSERT INTO t1 VALUES ('A');
  -4704104534098685251

5.7 with patch
--------------
INSERT INTO t1 VALUES ('a');
  3333046699319245624
  -7457025547288491032

INSERT INTO t1 VALUES ('A');
  -4704104534098685251
  8865803312774472942

This will mean that a transaction will be certified against two
writesets instead of just one.

The hash algorithms ensure that a given key always originate the
same hash. But they do not ensure that a given hash is only
originated from a given key, meaning that two keys may originate the
same hash.
This may have impact in this approach, since we may have cross
writesets collisions, example:

 INSERT (1) -> hash 1: X
            -> hash 2: Y

 INSERT (9) -> hash 1: Y
               hash 2: Z

Though this is not problematic:
 a) on writeset parallel applier, if we have writeset collisions
    those transactions will be applied on different logical groups.
    Data loss: none.
    Data deviation: none.
    User impact: none.
    Performance impact: negligible, assuming these collisions
                        do not happen so often.

b) on Group Replication multi-master certification, we may have
    false conflicts which will cause not needed rollbacks.
    Data loss: none.
    Data deviation: none.
    User impact: negligible, assuming these collisions
                 do not happen so often.
    Performance impact: negligible, assuming these collisions
                        do not happen so often.

c) On Group Replication single-primary failover, while the new
    primary is still applying the old primary data, we may have
    false conflicts which will cause not needed rollbacks.
    Data loss: none.
    Data deviation: none.
    User impact: negligible, assuming these collisions
                 do not happen so often.
    Performance impact: negligible, assuming these collisions
                        do not happen so often.

Summing all, this should be best approach, from the user
perspective; and despite we need to maintain the two hash formats on
5.7, is also the simpler code approach since we do not engage on
more read only states.
[22 Feb 2018 17:38] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 5.7.21:
        When write sets are used for parallelization by a replication
        slave, the case and accent sensitivity of the database are now
        taken into account when generating the write set information.
        Write set information is generated when the
        <literal role="sysvar">transaction_write_set_extraction</literal>
        system variable is enabled. Previously, duplicate keys could be
        incorrectly identified as different, causing transactions to
        have incorrect dependencies and so potentially be executed in
        the wrong order.
[23 Feb 2018 9:01] Margaret Fisher
Posted by developer:
 
Corrected release to 5.7.22 in changelog.
[11 Apr 2018 16:24] Erlend Dahl
Bug#89141 Error in Group Replication caused by bad Write Set tracking.

Bug#89194 Wrong certification lead to data inconsistency and GR breakage.

were marked as duplicates.
[12 Apr 2018 7:50] Jean-François Gagné
Referencing the public version of Bug#26277771: Bug#86078.