Bug #106770 Document MySQL replication protocol change 8.0.27 -> 8.0.28
Submitted: 18 Mar 2022 8:46 Modified: 24 Mar 2022 9:59
Reporter: Simon Mudd (OCA) Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Documentation Severity:S4 (Feature request)
Version:8.0.28 OS:Any
Assigned to: Jon Stephens CPU Architecture:Any
Tags: 8.0.27, 8.0.28, casting, RBR, replication, virtual column

[18 Mar 2022 8:46] Simon Mudd
Description:
It looks like 8.0.27 to 8.0.28 made a change in the replication behaviour when replicating virtual columns between a master and replica when using RBR.

I understand that if there is no primary key on a table and minimal RBR is being used then a DELETE and perhaps an UPDATE statement will lead to the following change in behaviour:

8.0.27 sends all rows with the virtual column "data" being sent in the binlog stream, this is used then by the applier to apply the corresponding change.

8.0.28 changes this behaviour by sending the virtual column "data" as NULL, I believe the intent is that the applier will generate this from the virtual column definition on the replica.

Currently the replication flow and what's seen is not documented. This is most unfortunate as the only way to find out how replication works is to read the code. yet the code may change and so it's hard to know what SHOULD happen or if the code changes behaviour which might trigger bugs or incompatible behaviour.

I think therefore that the replication protocol should be documented functionally in such a way that the current code should keep working but when there are doubts about behaviour or if it changes the documentation of the protocol should reflect this.

Many people these days read binlogs to process changes moving data out of MySQL perhaps into other MySQL systems or into systems which are not MySQL at all. Having this documentation makes it much easier for people to follow changes and understand or be able to test if their application behaves correctly when reading  a binlog stream.

How to repeat:
check replication stream of such a table with a virtual column, something like:

create database if not exists test;
use test;
drop table if exists t;
create table t (
a int,
d mediumblob,
i binary(16) generated always as (unhex(md5(cast(`d` as char(3072) charset utf16)))) VIRTUAL,
key (i)
) engine=innodb;

Insert some rows, then delete them.
Investigate the output of the binlog stream for 8.0.27 and 8.0.28

I believe the stream will be different as indicated above.

Suggested fix:
If my description above is correct please update the release notes for 8.0.28 describing the change in binlog generation related to this specific set of circumstances.

Ideally as described look to document functionally the binlog stream and how a client can interface with it and understand the data stream the master can send.

The change in behaviour should not actually break anything but it does change the structure of the binlog stream which reads may not be aware of. (mysql server can handle this)
[18 Mar 2022 9:48] MySQL Verification Team
Testcase from https://bugs.mysql.com/bug.php?id=106437
can be used.

You'll see due to invalid CAST in 8.0.28 column @3 has NULL in the binary log.
[18 Mar 2022 13:35] Jon Stephens
I've been following this issue, assigning bug report to myself.
[18 Mar 2022 19:37] Jon Stephens
Possibly of interest for developer docs, but does not seem relevant to end user documentation.
[24 Mar 2022 9:59] Simon Mudd
> Possibly of interest for developer docs, but does not seem relevant to end user documentation.

Indeed, from the normal user perspective this should be invisible and transparent. However, these days with CDC type imports often happening from binlogs describing the binglog flows and content and behaviour becomes quite important.

A lot of software is often not updated and may not support the latest changes in the binlog streams and I think that may partly be because until you read the source code you can't tell what's changed or when it changed so writing code to read binlogs for processing outside of the traditional master -> replica setup may be more common.  I have seen such issues with  compressed binlogs not being supported even though it's a feature released some time ago.  For any developer of such CDC type software in the end they ask the DBAs to turn off the feature as they simply can not handle it which completely defeats the purpose and reason it was enabled on the servers.

So while this specific request is for add a comment to the release notes (technical developer facing) generally having more specific documentation on the binlog format and how it should be consumed would be very interesting.