Description:
The writesets generated for keys were not using collation into
consideration, and because of that wrong last_committed and
sequence_number were getting added to binary log and thus
transactions were getting applied in wrong order by parallel applier
on slave.
This issue will be fixed on BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT on 8.0, which will
take care of:
1) Does the writeset depend on the row physical layout?
Things like padding, binary collation and so on.
If so what challenges does this present to upgrades?
2) The buffer used to hold the writeset is a string buffer, which
is only considering the characters until the first '\0',
ignoring the complete key value.
We need to change the code to use a byte buffer.
3) It looks like that on the pushed patch for this bug, which was
later reverted, there was double string collation conversation.
This needs to be validated.
Since this issue also affects Group Replication, we need to backport
it to 5.7, on which we need to deal with:
4) Upgrade path, while a writeset format change between 5.7 and 8.0
is not a problem, since when 8.0 members join a 5.7 group, the
8.0 members are not allowed to do writes.
Between minor versions on 5.7 it is a problem, we need a
upgrade path to make this fix possible without or with minimal
user intervention.
This bug will backport BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT to 5.7.
How to repeat:
Please see BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT
Suggested fix:
We could say that a 5.7 version with the fix would set read only
when mixed with versions without the patch, but that would be
intrusive and not streamline to the users. We or users would need to
check when read only must be to unset.
That would for sure collide with the non trivial primary elections
algorithms that we already have.
One alternative that we have to make this transparent for the users
is, on 5.7, always send the both versions of the hash, this would
mean that a row change will generate 2 instead of 1 writeset, and 4
instead of 2, when PK are involved.
With this in-place we can mix patched and unpatched versions:
CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin
NOT NULL PRIMARY KEY);
5.7 without patch
----------------
INSERT INTO t1 VALUES ('a');
3333046699319245624
INSERT INTO t1 VALUES ('A');
-4704104534098685251
5.7 with patch
--------------
INSERT INTO t1 VALUES ('a');
3333046699319245624
-7457025547288491032
INSERT INTO t1 VALUES ('A');
-4704104534098685251
8865803312774472942
This will mean that a transaction will be certified against two
writesets instead of just one.
The hash algorithms ensure that a given key always originate the
same hash. But they do not ensure that a given hash is only
originated from a given key, meaning that two keys may originate the
same hash.
This may have impact in this approach, since we may have cross
writesets collisions, example:
INSERT (1) -> hash 1: X
-> hash 2: Y
INSERT (9) -> hash 1: Y
hash 2: Z
Though this is not problematic:
a) on writeset parallel applier, if we have writeset collisions
those transactions will be applied on different logical groups.
Data loss: none.
Data deviation: none.
User impact: none.
Performance impact: negligible, assuming these collisions
do not happen so often.
b) on Group Replication multi-master certification, we may have
false conflicts which will cause not needed rollbacks.
Data loss: none.
Data deviation: none.
User impact: negligible, assuming these collisions
do not happen so often.
Performance impact: negligible, assuming these collisions
do not happen so often.
c) On Group Replication single-primary failover, while the new
primary is still applying the old primary data, we may have
false conflicts which will cause not needed rollbacks.
Data loss: none.
Data deviation: none.
User impact: negligible, assuming these collisions
do not happen so often.
Performance impact: negligible, assuming these collisions
do not happen so often.
Summing all, this should be best approach, from the user
perspective; and despite we need to maintain the two hash formats on
5.7, is also the simpler code approach since we do not engage on
more read only states.