Bug #92829 Verify coordinated action safety agains plugin starts and stops
Submitted: 17 Oct 2018 14:32 Modified: 12 Dec 2018 9:13
Reporter: Pedro Gomes Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.14 OS:Any
Assigned to: CPU Architecture:Any

[17 Oct 2018 14:32] Pedro Gomes
Description:
When most group coordinated actions execute such as 
 group_replication_switch_to_multi_primary_mode
or 
 group_replication_set_as_primary
they do check if the member is running and in a majority.

The code however assumes the plugin stays running for the rest of the checks.
If the plugin stops and deletes some of structures being checked, the server might crash.

The same applies to the coordinate_action_execution method on Group_action_coordinator.
Loads of checks are made before the declaring an action running, something the stop process does wait on before continuing. 
 

How to repeat:
Look at 

  int Group_action_coordinator::coordinate_action_execution(

Look at 

  static bool group_replication_switch_to_multi_primary_mode_init(

note how we do 
  group_contains_recovering_member
after 
  member_online_with_majority

The server might stop here.

Suggested fix:
All proposed actions shall be accounted and canceled on GR stop.
[12 Dec 2018 9:13] David Moss
Posted by developer:
 
Thank you for your feedback, this has been fixed in upcoming versions and the following was added to the 8.0.14 changelog:
When a group was being reconfigured online, for example using group_replication_switch_to_multi_primary_mode or group_replication_set_as_primary, there was a chance that stopping a member could result in an unexpected stop. Now, when you issue STOP GROUP_REPLICATION, if the member is part of an online group that is being reconfigured, the group coordinator is informed that Group Replication is stopping and the member waits for the online configuration to finish.
[22 Jan 2019 13:52] Margaret Fisher
Posted by developer:
 
Publishing MySQL bug number. Changelog entry now reads as requested
"The member waits for the online configuration process to complete any ongoing actions, but any subsequent actions are cancelled."