Description:
>>> This is without shutting it down.
Currently if you want to uninstall an audit plugin you MUST kill all (most) connections seen from SHOW PROCESSLIST (including your own) for the refcount in the server to go to zero. If you don't do this you see the plugin in a state DELETED, but it has not been unloaded.
So it seems easy to kill all connections, disconnect the existing connection, to reduce the refcount to 0 and reconnect. That should be enough.
This works and the plugin has gone and you can load it again if you wish or load an updated version of the audit plugin you were using before.
However, this will not work for GR. If you kill the GR connections then GR will fail and START GROUP_RELICATION on it's own or STOP GROUP_REPLICATION followed by START GROUP_REPLICATION will not work. I've seen that the only way to get things working is to restart `mysqld` completely. That works but takes longer, the innodb buffer pool is cold so takes time to warm up so the instance stops being available for a longer time. e.g. < 1 second to maybe 30 seconds for flushing buffer, restarting plus warming the cache.
How to repeat:
- Start mysqld.
- load an audit plugin
- configure GR and ensure it's running the same on all servers
- try to unload the audit plugin, it will go to status DELETED so unload is not complete
- kill all threads seen in processlist (remembering to not kill my own thread) then disconnect, and reconnect
- you should see that the audit plugin is now unloaded
- also GR is broken with status ERRROR
- try START GROUP_REPLICATION -> this fails
- try STOP GROUP_REPLICATION; START GROUP_REPLICATION -> this fails
- restart mysqld with RESTART and you'll see it comes back with the plugin unloaded
Suggested fix:
Suggested fixes:
- Make GR more resilient to losing all threads: provide a way to restart GR by restarting the threads that died. I hear that some state may be missing to make this possible, if so fix that.
- Make START GROUP_REPLICATION handle the situation of a bad shutdown (or killed threads etc) as if you restart `mysqld` things work. I would expect START GROUP_REPLICATION to behave the same way. I suspect that the server startup code has extra logic to handle crash recovery missing from the START GROUP_REPLICATION command.
This will enhance GR as restarting MySQL to recover a group member seems to be an unnecessary step but seems to be a recommendation in some failure scenarios. The code can already be smart enough but it's not covered in this case.
Also it should be possible to unload an audit plugin. GR should be no different.
You could add smarter code to handle a GR connection using the MYSQL protocol not native GCS to skip the audit plugin integration as auditing is supposed to be for external connections, or that's what I expect.