MySQL Bugs: #105776: Simplify or automate the GR recovery user management

Bug #105776	Simplify or automate the GR recovery user management
Submitted:	2 Dec 2021 16:12	Modified:	6 Mar 2022 19:30
Reporter:	Simon Mudd (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Group Replication	Severity:	S3 (Non-critical)
Version:	8.0.26	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	cleanup, gr, group replication, lots of accounts, recovery user, use single account

Description:
I notice when doing this:

root@myhost [mydb]> select user,host from mysql.user where user like 'mysql_innodb_cluster%';
+--------------------------------+------+
| user                           | host |
+--------------------------------+------+
| mysql_innodb_cluster_187006011 | %    |
| mysql_innodb_cluster_187006026 | %    |
| mysql_innodb_cluster_187013036 | %    |
| mysql_innodb_cluster_188007132 | %    |
| mysql_innodb_cluster_188048129 | %    |
| mysql_innodb_cluster_190199028 | %    |
| mysql_innodb_cluster_190214035 | %    |
| mysql_innodb_cluster_198020216 | %    |
| mysql_innodb_cluster_198173036 | %    |
| mysql_innodb_cluster_211037210 | %    |
| mysql_innodb_cluster_214023225 | %    |
| mysql_innodb_cluster_214028074 | %    |
| mysql_innodb_cluster_214081021 | %    |
| mysql_innodb_cluster_214100005 | %    |
| mysql_innodb_cluster_217053015 | %    |
| mysql_innodb_cluster_217059111 | %    |
| mysql_innodb_cluster_217067047 | %    |
| mysql_innodb_cluster_217189154 | %    |
| mysql_innodb_cluster_217201067 | %    |
| mysql_innodb_cluster_217201233 | %    |
| mysql_innodb_cluster_217208050 | %    |
| mysql_innodb_cluster_218016126 | %    |
| mysql_innodb_cluster_218034156 | %    |
| mysql_innodb_cluster_218035057 | %    |
| mysql_innodb_cluster_218035089 | %    |
| mysql_innodb_cluster_218037023 | %    |
| mysql_innodb_cluster_218042164 | %    |
| mysql_innodb_cluster_220092064 | %    |
+--------------------------------+------+
28 rows in set (0.00 sec)

root@myhost [mydb]> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.26    |
+-----------+
1 row in set (0.00 sec)

See: https://dev.mysql.com/doc/mysql-shell/8.0/en/innodb-cluster-user-accounts.html

So this ends up with a large list of accounts as members go away and new members are added which is not convenient.  Older accounts are not cleaned up.

Also why do we need to have individual accounts. This should be managed by the cluster.

This is a problem because auditors like to ensure that all accounts are accounted for and unneeded accounts are removed.  This triggers more work for DBAs than should be needed.

How to repeat:
- Create a GR cluster using the shell.
- Add new members, remove old members and do this a few times.
- Notice how the list of mysql_innodb_cluster_XXXX users increases and many of these users are to all intents and purposes useless.

Suggested fix:
- consider removing the account if a member is removed from the cluster.
- consider using a single username/password combination
- why does this need to be done manually? Could the group not share such credentials when a new member joins thus avoiding the problem and auto-generate the first password using the same user?
- even if stored in memory and shutting down the cluster, when restarting the user/password could be auto-regenerated by the first starting member and other members could pick this information up when rejoining the cluster.

Bottom line: while this may appear like a non-issue it should be simple to fix and the user/password management should not require manual maintenance or cleanup.

Hello Simon,

Thank you for the reasonable feature request!

regards,
Umesh

Related: if you disable the accounts (thinking they are not used) nothing obvious will happen until the group member has to recover, then it will fail. This could happen a long time later and would not be obvious until the recovery fails so can be problematic.

Changing this back to bug, this is not a feature request.

The expected behavior is to have the mysql user removed when the member is removed from the cluster. 

We tried to repeat the problem and the users do get deleted in our tests.

Can you share the exact steps taken to reproduce this?

In 8.0.17 we added functionality to track and clean up these internal users, read more on  https://dev.mysql.com/doc/relnotes/mysql-shell/8.0/en/news-8-0-17.html

ok, I then need to confirm procedures used with colleagues. I just saw the symptoms and if the current code should be handling things properly this may be an error at our end. I will check and come back to you.

From the release notes:

> In addition Cluster.removeInstance() no longer removes all recovery accounts on the
> instance being removed. It now removes the recovery account of the instance being
> removed on the primary and waits for the changes to be replicated before actually
> removing the instance from the group

I wonder whether the left-over accounts are from hosts that fail fully and therefore cannot be cleanly removed with `cluster.removeInstance()`? Or whether the accounts don't get deleted when removal is done as part of a `cluster.rescan()`?

>>> I wonder whether the left-over accounts are from hosts that fail fully and therefore cannot be cleanly removed with `cluster.removeInstance()`?

Exactly, that's the only scenario on which the recovery account of that instance being removed is actually *not* dropped. That's seen when removing instances that are either unreachable or offline (GR stopped) and only by using the 'force' options the instance can be removed from the Cluster. In practice, only the Metadata is updated.

The reason behind that is that we couldn't know which was the account to be dropped. However, since 8.0.17 the internal recovery accounts info is stored in the Metadata, and therefore, the account can be removed in such cases.

Confirming this as a bug.

Thanks for the update and clarifications.

Posted by developer:
 
Changelog entry added for MySQL Shell 8.0.29:

If an instance was added to an InnoDB Cluster using MySQL Clone, and Ctrl+C or the waitRecovery: 0 option was used to let recovery take place in the background, the recovery account from the donor instance was used instead of the recovery account created for the new instance. If the new instance was later removed with a Cluster.removeInstance command, the recovery account created for it was not dropped, since AdminAPI was only checking for the recovery account that was actually in use on the instance (the donor’s account). The account removal logic now considers the account created for the instance and cleans it up even if it was not used.