Bug #108061 Contribution by Tencent: Generated column calculate error without base column
Submitted: 3 Aug 2022 11:33 Modified: 4 Aug 2022 2:01
Reporter: Xiaodong Huang (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.39 OS:Any
Assigned to: CPU Architecture:Any
Tags: generate columns

[3 Aug 2022 11:33] Xiaodong Huang
Description:
When binlog_row_image is not equal to FULL, the rows event may not have the dependent base columns of generated column. In Replica, it need recalculate the expression of generated column.  As a result, Replayed the event may lead to error due to missing the essential base columns.

How to repeat:
step1. config a replication with a source and a replica.

step2. In the source,  create a table and insert a row as following:

CREATE TABLE `t1` (
  `id` char(26) COLLATE utf8mb4_unicode_ci NOT NULL,
  `body` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `config_name` varchar(180) CHARACTER SET utf8mb4 GENERATED ALWAYS AS (json_unquote(json_extract(`body`,_utf8mb4'$.name'))) VIRTUAL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `t1`(`id`,`body`) VALUES ( '000000000000000000000000XX’, '{\"application\":\"xxxxx\",\"name\":\"xxxxx1\"}’);

step3. In the source, set the session binlog_row_image as ‘minimal’

set binlog_row_image=‘minimal’;

step4. In the source, execute the following update statement:

update t1 set body='{\"application\":\"xxx\",\"name\":\"xxxxx2\"}' where id= '000000000000000000000000XX';

In the replica, executing "show slave status” will see replication error: "Last_SQL_Errno: 3141, Last_SQL_Error: Could not execute Update_rows event on table test.t1; Invalid JSON text in argument 1 to function json_extract: "The document is empty." at position 0., Error_code: 3141;”.

Althertive,  the follow testcase also can be used to repeat this problem:

--source include/have_binlog_format_row.inc
--source include/master-slave.inc
--echo #
--echo # Prepare
--echo #

CREATE TABLE `t1` (
  `id` char(26) COLLATE utf8mb4_unicode_ci NOT NULL,
  `body` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `config_name` varchar(180) CHARACTER SET utf8mb4 GENERATED ALWAYS AS (json_unquote(json_extract(`body`,_utf8mb4'$.name'))) VIRTUAL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

INSERT INTO `t1`(`id`,`body`) VALUES (
	      '000000000000000000000000XX',
				'{\"application\":\"xxxxx\",\"name\":\"xxxxx1\"}');

select * from t1;

--echo #
--echo # Run
--echo #

--source include/rpl_connection_master.inc
set binlog_row_image='minimal';

update t1 set body='{\"application\":\"xxx\",\"name\":\"xxxxx2\"}' where id= '000000000000000000000000XX';
--source include/sync_slave_sql_with_master.inc
select * from t1;

--echo #
--echo # Cleanup
--echo #

--source include/rpl_connection_master.inc
DROP TABLE t1;
--source include/rpl_end.inc

Suggested fix:
Analyze: in the bugfix “ BUG#30034874: INDEX CORRUPTION WHEN SLAVE HAS VIRTUAL COLUMNS/INDEXES AND MASTER DOESN’T” , When replica unpack the before image or after image in update rows event , the generated column need be recalculated through the base columns.

When binlog_row_image is set as MINIMAL or NOBLOB especially MINIMAL and the replicated table has primary key, the before image only has the columns belonging to the primary key.If the colums does not include the base columns, it will occur error that the replica calculate the generated column.

For example,  the update query generate the update row event is as following in the above testcase

#220803 16:53:35 server id 1  end_log_pos 1456 CRC32 0xacba0b73 	Update_rows: table id 223 flags: STMT_END_F
### UPDATE `test`.`t1`
### WHERE
###   @1='000000000000000000000000XX' /* STRING(104) meta=65128 nullable=0 is_null=0 */
### SET
###   @2='{"application":"xxx","name":"xxxxx2"}' /* LONGBLOB/LONGTEXT meta=4 nullable=0 is_null=0 */
###   @3='xxxxx2' /* VARSTRING(720) meta=720 nullable=1 is_null=0 */

The where clause “WHERE @1=‘000000000000000000000000XX’” is "before image. Unpacking the before image will calculate the generated column @3(config_name) through the base column @2(body) but the before image does not include @2(body). Eventually, the SQL thread report a error.

Fix: If the before image or after image include any generated column, it must include the base column of the generated column. Therefore, the generated column can be calculated correctly.
[3 Aug 2022 11:55] MySQL Verification Team
Hello Xiaodong,

Thank you for the report and contribution.

regards,
Umesh
[4 Aug 2022 2:01] Xiaodong Huang
8.0 revert the  BUG#30034874: INDEX CORRUPTION WHEN SLAVE HAS VIRTUAL COLUMNS/INDEXES AND MASTER DOESN'T at commit: 5850706f4f618d244fe2081f6de041399ca38cdd. 5.7 should also revert this bugfix first.