MySQL Bugs: #113548: Cloning failing in Mysql Group replication

Bug #113548	Cloning failing in Mysql Group replication
Submitted:	4 Jan 2024 7:38	Modified:	9 Jan 2024 9:30
Reporter:	Pravata Dash	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S2 (Serious)
Version:	8.0.34-26	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	MySQL Group Replication

Description:
We have a 3-node MySQL group replication set up in K8s. After all the respective setups, during the cloning process from the primary node to rebuild or add a secondary node to the cluster, the process is automatically closing with the following error:

ERROR 1026 (HY000): Error writing file ‘./#innodb_redo.#clone/#ib_redo0’ (errno: 22 - Invalid argument)
This occurs randomly, for example, while the primary is accepting writes and when it’s idle. Sometimes, the cloning operation succeeds without any error. Are there any specific factors causing this error and leading to the cloning failure?

Mysql version: 8.0.34-26

I’ve already checked the disk and read-only part; there was enough disk space available with disk being writable. I’ve noticed that it randomly fails with the above error, but after recreating the cluster, it works fine. However, the same issue reoccurs sometimes.

While researching online, I found some issues related to this with respect to O_DIRECT. However, in my case, the issue is only happening with innodb_flush_method O_DIRECT(we are using this). Link: https://bugs.mysql.com/bug.php?id=97755

Could you please let us know if there is any potential issue associated with innodb_flush_method O_DIRECT for this error, or if it might be happening due to something else?

If you see, the same command failed with the error and retrying after sometime succeed.

Failure:
mysql> CLONE INSTANCE FROM ‘donor_clone_user’@‘’:3306 IDENTIFIED BY ‘xxxxxxx’;
ERROR 1026 (HY000): Error writing file ‘./#innodb_redo.#clone/#ib_redo0’ (errno: 22 - Invalid argument)

Retrying after a few seconds without any changes:

Succeed:
mysql> CLONE INSTANCE FROM ‘donor_clone_user’@‘’:3306 IDENTIFIED BY ‘xxxxxxx’;
ERROR 3707 (HY000): Restart server failed (mysqld is not managed by supervisor process).
mysql> command terminated with exit code 137
POD logs during failure:

2023-12-19T07:54:31.000678Z 11 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Started
2023-12-19T07:54:31.072144Z 11 [Warning] [MY-013460] [InnoDB] Clone removing all user data for provisioning: Finished
2023-12-19T07:54:32.567528Z 11 [Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
2023-12-19T07:54:32.567545Z 11 [ERROR] [MY-012639] [InnoDB] Write to file ./#innodb_redo.#clone/#ib_redo0 failed at offset 3584, 1048576 bytes should have been written, only 0 were written. Operating system error number 22. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2023-12-19T07:54:32.567558Z 11 [ERROR] [MY-012640] [InnoDB] Error number 22 means ‘Invalid argument’

How to repeat:
When attempting to rebuild a secondary node (POD) in MySQL group replication from either the primary or secondary using the clone plugin feature, it randomly fails with the following error(details in Description section). Sometimes, after a few retries, it works.

Hello Pravata Dash,

Thank you for the report and feedback.
IMHO related Bug #110569 was fixed in MySQL Server 8.0.35, I suggest you to check with 8.0.35 and let us know if you are still seeing this. Thank you.

regards,
Umesh

Thank you for confirming. We will test this in version 8.0.35 and inform you if the issue reoccurs.