Bug #103577 cluster.status incorrectly shows as cloning at end of distributed recovery
Submitted: 4 May 2021 18:34 Modified: 17 May 2021 13:42
Reporter: Keith Lammers Email Updates:
Status: Verified Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S3 (Non-critical)
Version:8.0.24 OS:Ubuntu (20.04.2)
Assigned to: CPU Architecture:x86

[4 May 2021 18:34] Keith Lammers
Description:
When a cluster instance is recovering, distributed recovery is performed, then at the end of distributed recovery, cluster.status incorrectly shows that cloning is in progress.

The status shows the clone start date as some time in the past (seems to be whenever the last actual clone was performed), and error.log and network traffic do not indicate that it's actually doing a clone.

After some time, the instance is successfully online.

How to repeat:
* Run something that's inserting a lot of data and keep it running during the following steps
* Shutdown and instance for a while to build up a good backlog of transactions
* Start the instance and monitor the recovery process with cluster.status
* Once it's finished applying the transactions via distributed recovery, cluster.status will incorrectly show that there's a clone in progress
[10 May 2021 9:30] MySQL Verification Team
Hi Keith,

"Once it's finished applying the transactions via distributed recovery, cluster.status will incorrectly show that there's a clone in progress"

For how long do you see this after recovery finishes? I'm having issues reproducing this.

Thanks
[10 May 2021 11:29] Keith Lammers
In two separate clusters here, it will show this state for up to 15 minutes if I reboot an instance while the cluster has medium to heavy load on it. The cpu iowait does increase on the instance that is recovering, so you might need to put a heavy load on your cluster while testing to get it to show this data in MySQL Shell long enough.

During this time the receive queue count keeps increasing (because there is still load on the primary instance) but the applier queue count remains at 0.

This is what I see for the rebooted instance with cluster.status({extended:2}). Notice the cloneStartTime is May 4th, even though I performed this test today on May 10th.

"recovery": {
	"cloneStartTime": "2021-05-04 10:40:20.141",
	"cloneState": "Completed",
	"currentStage": "RECOVERY",
	"currentStageState": "Completed",
	"recoveryChannel": {
		"applierQueuedTransactionSet": "",
		"applierQueuedTransactionSetSize": 0,
		"applierState": "OFF",
		"applierStatus": "OFF",
		"applierThreadState": "",
		"receiverStatus": "OFF",
		"receiverThreadState": "",
		"receiverTimeSinceLastMessage": "838:59:59.000000",
		"source": null
	}
},
"recoveryStatusText": "Cloning in progress",
"role": "HA",
"status": "RECOVERING",
"transactions": {
	"appliedCount": 0,
	"checkedCount": 0,
	"committedAllMembers": "",
	"conflictsDetectedCount": 0,
	"inApplierQueueCount": 0,
	"inQueueCount": 28443,
	"lastConflictFree": "",
	"proposedCount": 0,
	"rollbackCount": 0
}

And error.log does indicate that it's working on distributed recovery:

2021-05-10T11:10:01.404962Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'

When it does finally complete, the "inQueueCount" number is abruptly back to 0 after being at a high number while it was in this false "cloning" status.

If you need anything else, please don't hesitate to ask :)
[10 May 2021 11:37] MySQL Verification Team
Hi Keith,

I probably didn't load the system enough, will retest. So if I understand correctly after 15 minutes everything is back to normal.

Thanks
[10 May 2021 11:40] Keith Lammers
That's correct, yep! After some time (it was about 15 minutes when I did the test for my last comment) the "inQueueCount" will suddenly be 0, and then soon after the instance is online.
[17 May 2021 10:14] MySQL Verification Team
Managed to verify this but I'm not 100% sure I would call it a bug. I'll set the bug to verified and let the innodb cluster team decide how to go from here. This happens (in my case) only if there is a significant load on the cluster and I'm not really sure that all the recovery procedures really are finished before the system say they are.

thanks for the report
Bogdan
[17 May 2021 13:42] Keith Lammers
For sure, the recovery still completes, it definitely seems like it must be doing some cleanup at that point, it's just a bit misleading because it says it's cloning when it's not actually cloning.

Thanks Bogdan!
[28 Mar 10:14] James Bewley
What is the status of this, I am seeing the same thing in 8.0.31

Interestingly the "receiverTimeSinceLastMessage" is extactly the same "838:59:59.000000"

               "recovery": {
                    "cloneStartTime": "2024-02-29 13:44:16.615",
                    "cloneState": "Completed",
                    "currentStage": "RECOVERY",
                    "currentStageState": "Completed",
                    "recoveryChannel": {
                        "applierQueuedTransactionSet": "",
                        "applierQueuedTransactionSetSize": 0,
                        "applierState": "OFF",
                        "applierStatus": "OFF",
                        "applierThreadState": "",
                        "receiverStatus": "OFF",
                        "receiverThreadState": "",
                        "receiverTimeSinceLastMessage": "838:59:59.000000",
                        "source": null
                    }
                },
                "recoveryStatusText": "Cloning in progress",
                "role": "HA",
                "status": "RECOVERING",
                "transactions": {
                    "appliedCount": 0,
                    "checkedCount": 0,
                    "committedAllMembers": "",
                    "conflictsDetectedCount": 0,
                    "inApplierQueueCount": 0,
                    "inQueueCount": 203575,
                    "lastConflictFree": "",
                    "proposedCount": 0,
                    "rollbackCount": 0
                },
                "version": "8.0.31"
            }
        },