Bug #102097 Cloning fails with low wait_timeout
Submitted: 31 Dec 2020 8:51 Modified: 25 Jan 2021 20:04
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Clone Plugin Severity:S3 (Non-critical)
Version:8.0.21, 8.0.22 OS:Any
Assigned to: CPU Architecture:Any

[31 Dec 2020 8:51] Daniël van Eeden
Description:
With a wait_timeout=300 set this happens:

2020-12-31T08:41:34.152728Z 39537 [Note] [MY-013417] [Server] The wait_timeout period was exceeded, the idle time since last command was too long.
2020-12-31T08:41:34.152850Z 39537 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Before sending COM_RES_ERROR: network : error: 13417: Got timeout reading communication packets.'
2020-12-31T08:41:34.152918Z 39537 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: After sending COM_RES_ERROR: error: 1159: Got timeout reading communication packets.'
2020-12-31T08:41:34.152934Z 39537 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: Exiting clone protocol: error: 1159: Got timeout reading communication packets.'

How to repeat:
Try cloning with a low `wait_timeout`.

Suggested fix:
1. Override the `wait_timeout` for cloning or have a separate setting for this as cloning and regular application/user traffic is very different.

2. Document requirements for `wait_timeout`.

3. Have the cloning process check the `wait_timeout` and report if this is too low and/or make the cloning code adhere to the timeout by using smaller chunks or something similar.
[4 Jan 2021 13:38] MySQL Verification Team
Hello Daniël,

Thank you for the report and feedback.
Verified as described on 8.0.22 build.

regards,
Umesh
[25 Jan 2021 20:04] Daniel Price
Posted by developer:
 
Fixed as of the upcoming 8.0.24 release, and here's the proposed changelog entry from the documentation team:

A long running remote cloning operation failed due to a low wait_timeout
setting on the donor MySQL Server instance. Donor threads use the MySQL
Server wait_timeout setting when listening for Clone protocol commands. To
avoid timeout failures on donor instances with a low wait_timeout setting,
the Clone idle timeout is now set to the default wait_timeout setting,
which is 28800 seconds (8 hours). Clone network read and write timeout
values were also increased. 

Thank you for the bug report.