Description:
Hi,
we have setup a 3 node (named sl03, sl04 and sl07) InnoDB Cluster one month ago and everything has worked as expected until yestarday evening: after a brief network outage one of the member, sl03 (the primary member), has not been able to reconnect to the cluster. During this outage all the members were seen as unreachable from each other.
In the error log of sl03 we noticed this:
2019-06-27T17:00:09.092624Z 1185164 [ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master 'mysql_innodb_cluster_r1742162625@sl04:3306' - retry-time: 60 retries: 1, Error_code: MY-001045
so we checked the mysql.user table of each node:
sl04
mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User | Host | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1744353246 | % | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744356490 | % | 2019-05-31 17:20:43 |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43 |
+----------------------------------+-----------+-----------------------+
sl03
mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User | Host | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1742162625 | % | 2019-05-31 11:15:04 |
| mysql_innodb_cluster_r1742162625 | localhost | 2019-05-31 11:15:05 |
| mysql_innodb_cluster_r1742697308 | % | 2019-05-31 12:44:11 |
| mysql_innodb_cluster_r1742697308 | localhost | 2019-05-31 12:44:11 |
| mysql_innodb_cluster_r1742710154 | % | 2019-05-31 12:46:20 |
| mysql_innodb_cluster_r1742710154 | localhost | 2019-05-31 12:46:20 |
| mysql_innodb_cluster_r1742735096 | % | 2019-05-31 12:50:29 |
| mysql_innodb_cluster_r1742735096 | localhost | 2019-05-31 12:50:29 |
| mysql_innodb_cluster_r1744353246 | % | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744356490 | % | 2019-05-31 17:20:43 |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43 |
+----------------------------------+-----------+-----------------------+
sl07
mysql> select User,Host,password_last_changed from mysql.user where User like 'mysql_inn%' order by password_last_changed;
+----------------------------------+-----------+-----------------------+
| User | Host | password_last_changed |
+----------------------------------+-----------+-----------------------+
| mysql_innodb_cluster_r1744353246 | % | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744353246 | localhost | 2019-05-31 17:20:11 |
| mysql_innodb_cluster_r1744356490 | % | 2019-05-31 17:20:43 |
| mysql_innodb_cluster_r1744356490 | localhost | 2019-05-31 17:20:43 |
+----------------------------------+-----------+-----------------------+
as you can see in node sl07 and sl04 the user mysql_innodb_cluster_r1742162625 is not present.
To get the password of this user we relied on the data present in mysql.slave_master_info (with channel name "group_replication_recovery").
After adding the missing user in sl04 and sl07 and granting the replication privilege we could rejoin sl03 to the cluster successfully.
Why the user was missing from sl04 and sl07?
Why there are many users named "mysql_innodb_..." in sl03?
Who created these users?
Should I remove the users that are unique to sl03?
Please note that these users can connect from any host to mysql: isn't it a security concern? Can I limit the access to the memebers of the cluster?
Thanks in advance.
Stefano Rocca
How to repeat:
I cannot reproduce it because I don't know how it happened (who creates this users?)
Suggested fix:
While creating the cluster, ask for the creation of a user to use for group recovery.