Bug #96837 cloning - clarify a few things in the documentation
Submitted: 11 Sep 11:50 Modified: 12 Sep 13:53
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Documentation Severity:S4 (Feature request)
Version:8.0.17 OS:Any
Assigned to: CPU Architecture:Any
Tags: cloning, documentation
Triage: Needs Triage: D5 (Feature request)

[11 Sep 11:50] Simon Mudd
Description:
So looking at the cloning and the various bits of documentation I find a few things a bit confusing.

How to repeat:
Do some cloning. Read the docs and maybe ask yourself some questions.

* clone_max_concurrency seems to be used with clone_autotune_concurrency so the max variable if autotune is OFF is actually the value to use. Somewhat confusing
* Auto-tuning. Is there some documentation on the logic of how this works and how many CPUs etc are used?
* are these settings applied on the source, or the destination or both? Documentation is not clear and I think that it only matters on the source. Clarification would be good.
* the default value of clone_max_concurrency is 16. Why not change this to the value of CPUs on the server? GOLANG does this to make auto-concurrency tuning easier, and I see this is better than having a static value which over time will be too small
* the clone_buffer_size is tiny. For any sort of db server it should have a lot of memory. I assume this is because you want to impact the source box which might be taking user queries. It's not clear to me right now how much difference increasing this buffer makes. A blog post to show the difference might be useful or some other type of reference.

I notice on a system I'm testing with 24 cores (see the value is higher than your default) that currently the auto-cloning using default settings is only using about 40% of the network bandwidth on the destination server (400 MB/s on a 10Gbe network card) and 8 cores so while the speed is pretty good clearly we're not hitting hardware limits on the destination box at all. Source server is dedicated for this sort of task. 

The source server logging shows Stage progress which is good to see. The destination server does not provide this information. It  might be useful via some sort of side channel to provide this information so the destination box can also better log its progress.

I do see that the source server while the clone is running has a few threads waiting for the backup lock. I guess this is documented but it's probably worth stating the impact of doing the cloning as having a dedicated server for this type of task may be ideal if resources permit. Also running this on a primary master may have a higher impact than using something like MEB / xtrabackup which work fine and don't block normal usage.

e.g.
* drop view if exists .... (pseudo-gtid replication)
* TRUNCATE TABLE .... 

* clone_valid_donor_list configuration seems a bit confusing. I think this is because of the way the cloning works, by pushing data from the source  to the destination. Yet if I run the CLONE INSTANCE command (and excluding issues such as NAT etc) you already know the server that's going to provide the data and push it back to you (the destination) so not sure that this setting makes much sense as you effectively set it to the host you are going to clone from and then run clone instance with the same hostname[:port].  If this is really needed, I would like to understand why. there's already user access provided and authorisation ensures the user has BACKUP_ADMIN, so the logic to me here of needing this extra variable and having to set it each time seems confusing.  It doesn't break anything but does just mean extra (apparently useless) configuration.

Suggested fix:
Sorry for a wide range of varied comments on this topic but I think that this could be better documented and perhaps a few of these settings could be made slightly better.

Comments here are fine, fixing the docs is fine and I'm sure that this will make this new process much better and smooth.
[12 Sep 8:23] Umesh Shastry
Hello Simon,

Thank you for the report.

regards,
Umesh
[12 Sep 13:53] Simon Mudd
Some of the docs on the P_S tables are not very explicit about which of the source or destination servers provide values. e.g. https://dev.mysql.com/doc/refman/8.0/en/clone-plugin-monitoring.html#clone-plugin-monitori...

For clone_status I would suggest adding the text: "of the target server. The table is empty on the source, or the target if no clone operations have taken place, or if the server has been restarted following the completion of the clone process.”