Description:
When attempting to setup asynchronous replication failure for a replica of a group replication cluster (e.g. https://dev.mysql.com/blog-archive/asynchronous-replication-connection-failover-automatic-... ), after starting the replication channel, the replica would get stuck with the following replication state:
Slave_IO_Running: Yes
Slave_IO_State: Checking source version
Slave_SQL_Running: Yes
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
In the error_log, entries such as that following were logged:
2024-05-08T12:04:44.123617Z 886890 [System] [MY-013997] [Repl] The connection has timed out after 3 retries connecting to 'replicant@${source_name}:3306' for channel '', and therefore the MySQL server is going to attempt an asynchronous replication connection failover, to 'replicant@${source_name}:3306'
2024-05-08T12:04:44.123693Z 886890 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
2024-05-08T12:04:44.126336Z 886890 [Warning] [MY-013684] [Repl] The IO thread failed to detect if the source belongs to the group majority on the source (host:${source_name} port:3306 network_namespace:) for channel ''.
After trying a few different ways to troubleshoot, I took a packet capture, and found that the replica was running the query from https://github.com/mysql/mysql-server/blob/6dcee9fa4b19e67dea407787eba88e360dd679d9/sql/rp... and receiving the error:
Error message: SELECT command denied to user 'replicant'@'${replica_ip}' for table 'replication_group_members'
It would have been much easier to understand why this wasn't working if the error received had been logged in the error_log.
How to repeat:
1. Setup a group replication cluster
2. Configure async replication on a replica using GTID and auto-position as you normally would (e.g. https://dev.mysql.com/doc/refman/8.0/en/replication-howto-repuser.html) and validate that it is working
3. Enable replication failover by:
3.1 Executing
asynchronous_connection_failover_add_managed('', 'GroupReplication', 'db355a82-0d23-11ef-a447-0279eea11f2d', '${source_name}', 3306, '', 80, 60)
3.2 Enabling failover and restarting replication by executing
stop slave; CHANGE MASTER TO SOURCE_CONNECT_RETRY=1, SOURCE_RETRY_COUNT=3, SOURCE_CONNECTION_AUTO_FAILOVER=1; start slave
4. Observe that replication isn't working, and `show replica status` shows the Slave_IO_State as `Checking source version`.
Suggested fix:
Add mysql_error(mysql) to the error logged here:
https://github.com/mysql/mysql-server/blob/6dcee9fa4b19e67dea407787eba88e360dd679d9/sql/rp...
An alternative may be to implicitly grant select access to performance_schema.replication_group_members for users with the `replication slave` privilege.
The documentation (e.g. https://dev.mysql.com/doc/refman/8.0/en/replication-asynchronous-connection-failover-sourc... ) could also indicated that, besides the typical grants required for async replication (e.g. `replication slave` as in https://dev.mysql.com/doc/refman/8.0/en/replication-howto-repuser.html ), select access to performance_schema.replication_group_members is required to make it work.