Bug #119674 Doc: Group Replication “write set” definition is unclear/misleading (group-replication-summary)
Submitted: 14 Jan 5:25 Modified: 14 Jan 8:45
Reporter: Balkrishna Basisthnarayan Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:9.2 OS:Any
Assigned to: CPU Architecture:Any
Tags: certification, conflict-detection, documentation, group-replication, writeset

[14 Jan 5:25] Balkrishna Basisthnarayan
Description:
On the MySQL 9.2 Reference Manual page “Group Replication Summary” (20.1.1.2 Group Replication), the description of “write set” is ambiguous / potentially misleading.

The page states that the server broadcasts “write values” and the corresponding “write set (the unique identifiers of the rows that were updated)”. The phrase “unique identifiers” can be interpreted as an internal row ID or other identifier, and does not clearly explain that write sets are derived from primary key (or PK-equivalent unique key) values and are used for certification/conflict detection.

This can confuse readers, especially because Group Replication requires primary keys (or equivalent unique keys) for certification/conflict detection, which implies the write set is tied to those keys.

How to repeat:
Open: https://dev.mysql.com/doc/refman/9.2/en/group-replication-summary.html

Navigate to section 20.1.1.2 Group Replication.

Find the sentence describing the broadcast of “write values” and “the corresponding write set (the unique identifiers of the rows that were updated)”.

Observe that “unique identifiers” is not specific and can be misunderstood (e.g., as internal row IDs rather than PK/unique-key-derived identifiers used for certification).

Suggested fix:
Replace the phrase:
“the corresponding write set (the unique identifiers of the rows that were updated)”

With:
“the corresponding write set (a compact representation derived from the primary key values, or primary-key-equivalent unique key values, of the rows modified; used for certification/conflict detection)”

Optional shorter alternative (if you want it less detailed):

“the corresponding write set (identifiers derived from primary key or unique key values of the rows modified, used for certification/conflict detection)”
[14 Jan 8:37] Frederic Descamps
a Write Set is even more than that:
It contains the hash for the rows PK that are changed and, in some cases, the hashes of foreign keys or other dependencies that need to be captured (e.g. non NULL UKs)

For example, with such a table:
+-------+-----------+------+-----+---------+-------+
| Field | Type      | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id    | binary(1) | NO   | PRI | NULL    |       |
| name  | binary(2) | YES  | UNI | NULL    |       |
| name2 | binary(1) | YES  |     | NULL    |       |
+-------+-----------+------+-----+---------+-------+

When we do such an insert, we can see that we have two hashes:

mysql> insert into t3 values (1,2,3);

pke: PRIMARY | test |t3 | 1 | 1    hash: 79134815725924853
pke: name    | test |t3 | 2        hash: 11034644986657565827

And when we do an update (still the pk and the non null unique key), and for both images (after and before):

mysql> update t3 set name=3 where id=1;

pke: PRIMARY | test | t3 | 1 | 1    hash: 79134815725924853
pke: name    | test | t3 | 3        hash: 18082071075512932388
pke: PRIMARY | test | t3 | 1 | 1    hash: 79134815725924853
pke: name    | test | t3 | 2        hash: 11034644986657565827
[14 Jan 8:45] Balkrishna Basisthnarayan
Agreed. The sentence may be misleading due to the assumptions it implies. Could we replace it with the recommendation below?...Just a suggestion

Replace:

“the corresponding write set (the unique identifiers of the rows that were updated)”

With:

“the corresponding write set (a set of one or more hashes derived from key values of the modified rows—such as the primary key and, when applicable, other relevant non-NULL unique keys and dependencies—used for certification/conflict detection)”