Bug #105545 Can not serialize records with functional key
Submitted: 12 Nov 2021 11:39 Modified: 12 Nov 2021 13:45
Reporter: world hello Email Updates:
Status: Unsupported Impact on me:
None 
Category:MySQL Utilities: Binlog Events Severity:S3 (Non-critical)
Version:>=8.0.13 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog events, functional key

[12 Nov 2021 11:39] world hello
Description:
  I use COM_BINLOG_DUMP to get ROW and FULL format binlog events from MySQL just like canal(https://github.com/alibaba/canal), and then parse it, but i meet a problem when a table with functional key in the special situation. 

  In the special situation, which a table with N column and M functional key( (N + M + 7) / 8 != (N + 7) / 8), we will fail to parse the row event according to the document(https://dev.mysql.com/doc/internals/en/rows-event.html)

Binlog::RowsEvent:
Payload
header:
  if post_header_len == 6 {
4                    table id
  } else {
6                    table id
  }
2                    flags
  if version == 2 {
2                    extra-data-length
string.var_len       extra-data
  }

body:
lenenc_int           number of columns
string.var_len       columns-present-bitmap1, length: (num of columns+7)/8
  if UPDATE_ROWS_EVENTv1 or v2 {
string.var_len       columns-present-bitmap2, length: (num of columns+7)/8
  }

rows:
string.var_len       nul-bitmap, length (bits set in 'columns-present-bitmap1'+7)/8
string.var_len       value of each field as defined in table-map
  if UPDATE_ROWS_EVENTv1 or v2 {
string.var_len       nul-bitmap, length (bits set in 'columns-present-bitmap2'+7)/8
string.var_len       value of each field as defined in table-map
  }
  ... repeat rows until event-end

  I try a lot of times to parse the row event, and I have found that something in the doc should be corrected or the MySQL source code should be modified.

How to repeat:
Create a table with 8 columns and 1 functional key which N = 8, M = 1 as mentioned above: 

CREATE TABLE functional_key_test.t2 (c1 INT primary key, c2 INT, c3 varchar(10), c4 timestamp, c5 bigint, c6 float, c7 datetime, c8 bool, Index func_index ((ABS(c1))));

INSERT INTO functional_key_test.t2 (c1) VALUES (1); 

We will get a TABLE_MAP_EVENT with column-count = 9 according to the doc(https://dev.mysql.com/doc/internals/en/table-map-event.html), it looks like the sum of number of real column and the number of virtual column.

And we will also get a WRITE_ROWS_EVENTv2 with the number of columns = 9 and length(nul-bitmap) = 16, when I parse the remaining event, I got a wrong result.

    //BINLOG '
    //WJmDYRN9ffeLUAAAABocCgAAAJMYAQAAAAEAE2Z1bmN0aW9uYWxfa2V5X3Rlc3QAAnQyAAkDAw8R
    //CAQSAQMFHgAABAD2AAEBAAIBIVJnino=
    //WJmDYRd9ffeLKwAAAEUcCgAAAJMYAQAAAAEACf/+9gEAAABhg5lYzfgqOw==
    //'/*!*/;
    //### INSERT INTO `functional_key_test`.`t2`
    //### SET
    //###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
    //###   @2=NULL /* INT meta=0 nullable=1 is_null=1 */
    //###   @3=NULL /* VARSTRING(30) meta=30 nullable=1 is_null=1 */
    //###   @4=1636014424 /* TIMESTAMP(0) meta=0 nullable=0 is_null=0 */
    //###   @5=NULL /* LONGINT meta=0 nullable=1 is_null=1 */
    //###   @6=NULL /* FLOAT meta=4 nullable=1 is_null=1 */
    //###   @7=NULL /* DATETIME(0) meta=0 nullable=1 is_null=1 */
    //###   @8=NULL /* TINYINT meta=0 nullable=1 is_null=1 */

Suggested fix:
The values of virtual columns are not stored in binlog at present, so I think the virtual column should not be counted in the TABLE_MAP_EVENT and WRITE_ROWS_EVENTv2, 
 if it has to be counted, the document should be modified. So i think there are 2 ways: 

1. Correct the document and modify the calculation of nul-bitmap, 

length (bits set in 'columns-present-bitmap1'+7)/8 should be changed to (number of real columns exclude virtual computed columns), the number of real columns can be calculate with the SQL, just like: 
select count(*) from information_schema.columns where table_schema = 'functional_key_test' and table_name = 't1';

2. Modify the logic of record binlog
[12 Nov 2021 13:45] MySQL Verification Team
Hi Mr. hello,

Thank you for your bug report.

However, we do not document fully internals of our server and components.

We see that you have found the only info that we have on the subject. We should like to point out that we do not explain nor document the internal functionality of our server to the full extent.

Beyond what is available, you will have to read the source code to understand how are row data saved in the binary log.

Some other companies have worked on parsing binary log data and they have developed their tools by reading our source code. This is one of the advantages of GPL license. You can analyse the source code as much as you need to.