MySQL Bugs: #108411: cluster instance status reports clone information during distributed recovery

Bug #108411	cluster instance status reports clone information during distributed recovery
Submitted:	7 Sep 2022 12:26	Modified:	8 Sep 2022 13:15
Reporter:	Jay Janssen	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	Shell AdminAPI InnoDB Cluster / ReplicaSet	Severity:	S3 (Non-critical)
Version:		OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
I have a node that was provisioned yesterday.  I just restarted mysqld on it, so it is indeed in RECOVERING.  However, the clusterset.status({extended:1}) is reporting metadata about the cloning that happened yesterday (specifically cloneStartTime and cloneStage), which seems confusing since it is doing a distributed recovery currently:

            "10.162.255.29:3306": {
                    "address": "10.162.255.29:3306",
                    "memberRole": "SECONDARY",
                    "mode": "R/O",
                    "recovery": {
                        "cloneStartTime": "2022-09-06 20:33:20.697",
                        "cloneState": "Completed",
                        "currentStage": "RECOVERY",
                        "currentStageState": "Completed"
                    },
                    "recoveryStatusText": "Cloning in progress",
                    "status": "RECOVERING",
                    "version": "8.0.30"
                }

Here is the log from this instance where it is clearly doing a distributed recovery:

2022-09-07T12:03:08.395624Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.30'  socket: '/data/mysql/mysql.sock'  port: 3306  MySQL Community Server - GPL.
2022-09-07T12:03:08.395676Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
2022-09-07T12:03:08.407597Z 12 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2022-09-07T12:03:15.865718Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node 10.162.255.199:3306. Please check the connection status to this member'
2022-09-07T12:03:16.547188Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 10.162.254.30:3306.'
2022-09-07T12:03:17.552059Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2022-09-07T12:03:17.552747Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 10.162.255.29:3306, 10.162.254.30:3306, 10.162.255.199:3306 on view 16624961713509837:9.'

<<this is where I collected the status output above>>

2022-09-07T12:07:20.552214Z 32 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='10.162.254.30', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2022-09-07T12:07:20.558068Z 61 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2022-09-07T12:07:20.586763Z 61 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_2448256127@10.162.254.30:3306',replication started in log 'FIRST' at position 4
2022-09-07T12:08:30.025661Z 62 [Warning] [MY-010584] [Repl] Slave SQL for channel 'group_replication_recovery': Coordinator thread of multi-threaded slave is being stopped in the middle of assigning a group of events; deferring to exit until the group completion ... , Error_code: MY-000000
2022-09-07T12:08:30.323219Z 32 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='10.162.254.30', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.

How to repeat:
Check clusterset.status({extended:1}) during a distributed recovery when mysqld is cleanly restarted.  It might also need to be the case that the instance has not restarted since its original clone.

Interestingly it does not do this behavior when I kill -9 the mysqld instance, when I get this instead:

                "10.162.255.29:3306": {
                    "address": "10.162.255.29:3306",
                    "memberRole": "SECONDARY",
                    "mode": "R/O",
                    "recovery": {
                        "state": "ON"
                    },
                    "recoveryStatusText": "Distributed recovery in progress",
                    "status": "RECOVERING",
                    "version": "8.0.30"
                }

Update on my kill -9 example, I didn't wait long enough.  That instance is indeed showing the clone metadata now that the distributed recovery is further along:

            "10.162.255.29:3306": {
                "address": "10.162.255.29:3306",
                "memberRole": "SECONDARY",
                "mode": "R/O",
                "readReplicas": {},
                "recovery": {
                    "cloneStartTime": "2022-09-06 20:33:20.697",
                    "cloneState": "Completed",
                    "currentStage": "RECOVERY",
                    "currentStageState": "Completed"
                },
                "recoveryStatusText": "Cloning in progress",
                "role": "HA",
                "status": "RECOVERING",
                "version": "8.0.30"
            }

Note this shows in cluster.status() as well.  It'd be WAY more helpful if it showed the time that the DR started.

Hi

How did you restart the problematic mysql? Did you restarted the whole server or only the mysqld service? How did you restart the mysql if that's what you did?

Thanks

Just the service, restarted with systemctl IIRC

Hi,

I cannot reproduce this. Regular setup, 3 nodes, restart one, everything works ok no matter on the load of the cluster at the time. Not sure what you did that triggered the problem. If you manage to reproduce the problem please let us know.

Thanks