Bug #105617 improve clone plugin reporting/logging while cloning
Submitted: 17 Nov 2021 14:51 Modified: 17 Nov 2021 15:38
Reporter: Simon Mudd (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Clone Plugin Severity:S4 (Feature request)
Version:8.0.27 OS:Any
Assigned to: CPU Architecture:Any

[17 Nov 2021 14:51] Simon Mudd
Description:

Looking at my logs I see:

2021-11-17T12:15:07.579891Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2021-11-17T12:15:12.574611Z 7 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune continue, Data: 459 MiB/sec, Target: 457 MiB/sec.'
2021-11-17T12:15:12.574700Z 7 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune Threads from: 2 to: 4 prev: 2 target: 4.'
2021-11-17T12:15:12.595921Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2021-11-17T12:15:12.595922Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Task Connect.'
2021-11-17T12:15:12.596193Z 0 [Note] [MY-013457] [InnoDB] Clone Apply Begin Task ID: 2
2021-11-17T12:15:12.596236Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2021-11-17T12:15:12.596238Z 0 [Note] [MY-013457] [InnoDB] Clone Apply Begin Task ID: 3
2021-11-17T12:15:12.596267Z 0 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Command COM_ATTACH.'
2021-11-17T12:15:17.598246Z 7 [Note] [MY-013272] [Clone] Plugin Clone reported: 'Client: Tune stop, Data: 509 MiB/sec, Target: 573 MiB/sec..'

and the time is 2 1/2 hours later.
That is all fine of course but if something goes wrong the detail of "current progress" is somewhat limited. I've seen some issues which I'm working on resolving but I think that while performance_schema.clone_progress does provide useful information a periodic update to logging would also be helpful.

Information to report could be:
- number of errors
- number of files copied vs total number
- and other things.
Those metrics are not available anywhere and while not absolutely necessary if things work if they fail the lack of visibility of other "useful metrics" is unhelpful.

How to repeat:
Do a large clone and notice that confirmation of progress may not happen for "a long time".

There are other places in mysqld logging to indicate progress:
- crash recovery
- buffer pool size change

come to mind immediately but I'm sure there are others I've forgotten so adding a bit more logging here would be helpful.

Suggested fix:
- Provide a periodic update every 60? or 300? seconds (or configure the period as for a large system the copy period may be quite long. We just want to see a confirmation of progress.)
- There do not appear to be [m]any error counters. The data is being copied over the network which is never 100% reliable so errors are expected. Exposing them is good, as are other success metrics.  p_s.clone_progress provides minimal information of the copy process so is helpful but not detailed enough to diagnose issues.
[17 Nov 2021 15:23] Simon Mudd
Specifically since I've seen issues with the file copy stage perhaps an indication of the current file that's being copied could be made?

| 1 | FILE COPY | In Progress | 2021-11-17 13:15:02.536692 | NULL | 4 | 3675459754787 | 3469725768483 | 3469911123316 | 5741689 | 5741995 |

Speed has dropped here but which file might this be related to? No idea.

And documentation on the current workings of the clone plugin are somewhat vague. I guess I'd need to go through and find the files most recently changed?
[17 Nov 2021 15:38] Simon Mudd
also add some indication of "active threads". This might be important as the auto-tuning may change the number being used. The logging doesn't clearly indicate how many of these are active and I get the impression it just scales up until a point where code decides "that's enough", I'm not sure that the number of threads decreases though if they're idle it may not matter.

Either way report the number of busy/active threads as this info could be handy.