Description:
The writesets generated for keys were not using collation into
consideration, and because of that wrong last_committed and
sequence_number were getting added to binary log and thus
transactions were getting applied in wrong order by parallel applier
on slave.
This issue will be fixed on BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT on 8.0, which will
take care of:
1) Does the writeset depend on the row physical layout?
Things like padding, binary collation and so on.
If so what challenges does this present to upgrades?
2) The buffer used to hold the writeset is a string buffer, which
is only considering the characters until the first '\0',
ignoring the complete key value.
We need to change the code to use a byte buffer.
3) It looks like that on the pushed patch for this bug, which was
later reverted, there was double string collation conversation.
This needs to be validated.
Since this issue also affects Group Replication, we need to backport
it to 5.7, on which we need to deal with:
4) Upgrade path, while a writeset format change between 5.7 and 8.0
is not a problem, since when 8.0 members join a 5.7 group, the
8.0 members are not allowed to do writes.
Between minor versions on 5.7 it is a problem, we need a
upgrade path to make this fix possible without or with minimal
user intervention.
This bug will backport BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT to 5.7.
How to repeat:
Please see BUG#26277771: BAD WRITE SET TRACKING
WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT
Suggested fix:
We could say that a 5.7 version with the fix would set read only
when mixed with versions without the patch, but that would be
intrusive and not streamline to the users. We or users would need to
check when read only must be to unset.
That would for sure collide with the non trivial primary elections
algorithms that we already have.
One alternative that we have to make this transparent for the users
is, on 5.7, always send the both versions of the hash, this would
mean that a row change will generate 2 instead of 1 writeset, and 4
instead of 2, when PK are involved.
With this in-place we can mix patched and unpatched versions:
CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin
NOT NULL PRIMARY KEY);
5.7 without patch
----------------
INSERT INTO t1 VALUES ('a');
3333046699319245624
INSERT INTO t1 VALUES ('A');
-4704104534098685251
5.7 with patch
--------------
INSERT INTO t1 VALUES ('a');
3333046699319245624
-7457025547288491032
INSERT INTO t1 VALUES ('A');
-4704104534098685251
8865803312774472942
This will mean that a transaction will be certified against two
writesets instead of just one.
The hash algorithms ensure that a given key always originate the
same hash. But they do not ensure that a given hash is only
originated from a given key, meaning that two keys may originate the
same hash.
This may have impact in this approach, since we may have cross
writesets collisions, example:
INSERT (1) -> hash 1: X
-> hash 2: Y
INSERT (9) -> hash 1: Y
hash 2: Z
Though this is not problematic:
a) on writeset parallel applier, if we have writeset collisions
those transactions will be applied on different logical groups.
Data loss: none.
Data deviation: none.
User impact: none.
Performance impact: negligible, assuming these collisions
do not happen so often.
b) on Group Replication multi-master certification, we may have
false conflicts which will cause not needed rollbacks.
Data loss: none.
Data deviation: none.
User impact: negligible, assuming these collisions
do not happen so often.
Performance impact: negligible, assuming these collisions
do not happen so often.
c) On Group Replication single-primary failover, while the new
primary is still applying the old primary data, we may have
false conflicts which will cause not needed rollbacks.
Data loss: none.
Data deviation: none.
User impact: negligible, assuming these collisions
do not happen so often.
Performance impact: negligible, assuming these collisions
do not happen so often.
Summing all, this should be best approach, from the user
perspective; and despite we need to maintain the two hash formats on
5.7, is also the simpler code approach since we do not engage on
more read only states.
Description: The writesets generated for keys were not using collation into consideration, and because of that wrong last_committed and sequence_number were getting added to binary log and thus transactions were getting applied in wrong order by parallel applier on slave. This issue will be fixed on BUG#26277771: BAD WRITE SET TRACKING WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT on 8.0, which will take care of: 1) Does the writeset depend on the row physical layout? Things like padding, binary collation and so on. If so what challenges does this present to upgrades? 2) The buffer used to hold the writeset is a string buffer, which is only considering the characters until the first '\0', ignoring the complete key value. We need to change the code to use a byte buffer. 3) It looks like that on the pushed patch for this bug, which was later reverted, there was double string collation conversation. This needs to be validated. Since this issue also affects Group Replication, we need to backport it to 5.7, on which we need to deal with: 4) Upgrade path, while a writeset format change between 5.7 and 8.0 is not a problem, since when 8.0 members join a 5.7 group, the 8.0 members are not allowed to do writes. Between minor versions on 5.7 it is a problem, we need a upgrade path to make this fix possible without or with minimal user intervention. This bug will backport BUG#26277771: BAD WRITE SET TRACKING WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT to 5.7. How to repeat: Please see BUG#26277771: BAD WRITE SET TRACKING WITH UNIQUE KEY ON A DELETE FOLLOWED BY AN INSERT Suggested fix: We could say that a 5.7 version with the fix would set read only when mixed with versions without the patch, but that would be intrusive and not streamline to the users. We or users would need to check when read only must be to unset. That would for sure collide with the non trivial primary elections algorithms that we already have. One alternative that we have to make this transparent for the users is, on 5.7, always send the both versions of the hash, this would mean that a row change will generate 2 instead of 1 writeset, and 4 instead of 2, when PK are involved. With this in-place we can mix patched and unpatched versions: CREATE TABLE t1 (c1 VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL PRIMARY KEY); 5.7 without patch ---------------- INSERT INTO t1 VALUES ('a'); 3333046699319245624 INSERT INTO t1 VALUES ('A'); -4704104534098685251 5.7 with patch -------------- INSERT INTO t1 VALUES ('a'); 3333046699319245624 -7457025547288491032 INSERT INTO t1 VALUES ('A'); -4704104534098685251 8865803312774472942 This will mean that a transaction will be certified against two writesets instead of just one. The hash algorithms ensure that a given key always originate the same hash. But they do not ensure that a given hash is only originated from a given key, meaning that two keys may originate the same hash. This may have impact in this approach, since we may have cross writesets collisions, example: INSERT (1) -> hash 1: X -> hash 2: Y INSERT (9) -> hash 1: Y hash 2: Z Though this is not problematic: a) on writeset parallel applier, if we have writeset collisions those transactions will be applied on different logical groups. Data loss: none. Data deviation: none. User impact: none. Performance impact: negligible, assuming these collisions do not happen so often. b) on Group Replication multi-master certification, we may have false conflicts which will cause not needed rollbacks. Data loss: none. Data deviation: none. User impact: negligible, assuming these collisions do not happen so often. Performance impact: negligible, assuming these collisions do not happen so often. c) On Group Replication single-primary failover, while the new primary is still applying the old primary data, we may have false conflicts which will cause not needed rollbacks. Data loss: none. Data deviation: none. User impact: negligible, assuming these collisions do not happen so often. Performance impact: negligible, assuming these collisions do not happen so often. Summing all, this should be best approach, from the user perspective; and despite we need to maintain the two hash formats on 5.7, is also the simpler code approach since we do not engage on more read only states.