Bug #96025 Missing replication recovery user after brief network outage
Submitted: 28 Jun 2019 11:47 Modified: 4 Jul 2019 8:43
Reporter: Stefano Rocca Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S1 (Critical)
Version:8.0.16 OS:Ubuntu (18.04.2)
Assigned to: CPU Architecture:x86

[28 Jun 2019 11:47] Stefano Rocca
Description:
Hi,
we have setup a 3 node (named sl03, sl04 and sl07) InnoDB Cluster one month ago and everything has worked as expected until yestarday evening: after a brief network outage one of the member, sl03 (the primary member), has not been able to reconnect to the cluster. During this outage all the members were seen as unreachable from each other.
In the error log of sl03 we noticed this:

2019-06-27T17:00:09.092624Z 1185164 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r1742162625@sl04:3306' - retry-time: 60  retries: 1, Error_code: MY-001045

so we checked the mysql.user table of each node:

sl04

mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User                             | Host      | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1744353246 | %         | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744356490 | %         | 2019-05-31 17:20:43   |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43   |
+----------------------------------+-----------+-----------------------+

sl03

mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User                             | Host      | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1742162625 | %         | 2019-05-31 11:15:04   |
| mysql_innodb_cluster_r1742162625 | localhost | 2019-05-31 11:15:05   |
| mysql_innodb_cluster_r1742697308 | %         | 2019-05-31 12:44:11   |
| mysql_innodb_cluster_r1742697308 | localhost | 2019-05-31 12:44:11   |
| mysql_innodb_cluster_r1742710154 | %         | 2019-05-31 12:46:20   |
| mysql_innodb_cluster_r1742710154 | localhost | 2019-05-31 12:46:20   |
| mysql_innodb_cluster_r1742735096 | %         | 2019-05-31 12:50:29   |
| mysql_innodb_cluster_r1742735096 | localhost | 2019-05-31 12:50:29   |
| mysql_innodb_cluster_r1744353246 | %         | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744356490 | %         | 2019-05-31 17:20:43   |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43   |
+----------------------------------+-----------+-----------------------+

sl07

mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User                             | Host      | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1744353246 | %         | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11   |
| mysql_innodb_cluster_r1744356490 | %         | 2019-05-31 17:20:43   |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43   |
+----------------------------------+-----------+-----------------------+

as you can see in node sl07 and sl04 the user mysql_innodb_cluster_r1742162625 is not present.
To get the password of this user we relied on the data present in mysql.slave_master_info (with channel name "group_replication_recovery").

After adding the missing user in sl04 and sl07 and granting the replication privilege we could rejoin sl03 to the cluster successfully.

Why the user was missing from sl04 and sl07?
Why there are many users named "mysql_innodb_..." in sl03?
Who created these users?
Should I remove the users that are unique to sl03?
Please note that these users can connect from any host to mysql: isn't it a security concern? Can I limit the access to the memebers of the cluster?

Thanks in advance.

Stefano Rocca

How to repeat:
I cannot reproduce it because I don't know how it happened (who creates this users?)

Suggested fix:
While creating the cluster, ask for the creation of a user to use for group recovery.
[4 Jul 2019 8:43] MySQL Verification Team
Hello Stefano Rocca,

Thank you for the report.
I attempted to reproduce this issue but no luck. If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.  

Coming to your questions, quoting from official manual "As part of using Group Replication, InnoDB cluster creates internal users which enable replication between the servers in the cluster. These users are internal to the cluster, and the user name of the generated users follows a naming scheme of mysql_innodb_cluster_r[10_numbers]. " Please refer for more details - https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-production-deployment.html
Thank you for your interest in MySQL.

Thanks,
Umesh
[4 Jun 2020 8:22] Erun e
Hi,

I am also getting the same issue. This Happend while rebooting the server.