Bug #93027 Centralize the leave on error code for all components
Submitted: 31 Oct 2018 12:43 Modified: 19 Jun 2019 13:02
Reporter: Pedro Gomes Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.14 OS:Any
Assigned to: CPU Architecture:Any

[31 Oct 2018 12:43] Pedro Gomes
Description:
After bugs like 
	
Bug#28382590: ASSERT `!IS_SET()' AT SQL_ERROR.CC:379 UPON STOP G.R. IN ERRORED MEMBER
or
Bug#28550736 RACE CONDITION ON VIEW_CHANGE_NOTIFIER

Upon looking at the proposed patches it is obvious that every code solution that deals with a leave on error situation must usually be spread across many files and components.

Also, if you look at the first bug, part of it was due to the fact that channel stop was only coded in some of these components.

How to repeat:
Look at the code

Recovery_module::leave_group_on_recovery_failure(
Group_action_coordinator::kill_transactions_and_leave(
kill_transactions_and_leave_on_election_error(
Group_partition_handling::kill_transactions_and_leave(
Applier_module::leave_group_on_failure(
Applier_module::kill_pending_transactions(

Suggested fix:
All of this operations share the same code, some have some logging that can be externalized, but overall they can all be merged

void leave_on_error(){
  // Action errors might have expelled the member already
  if (Group_member_info::MEMBER_ERROR ==
      local_member_info->get_recovery_status()) {
    return;
  }

 update status

  notify_and_reset_ctx(ctx);
  if wait_for_view
  view_change_notifier.start_view_modification();
  leave
  stop channels
  handle leave states
  call kill_pending_transactions
}

void kill_pending_transactions (bool set_read_mode, bool threaded_sql_session,
    Gcs_operations::enum_leave_state leave_state,
    Plugin_gcs_view_modification_notifier *view_notifier,
    string message) {

  lock
  unblock_waiting_transactions
  unlock

  if set_read_mode
    set read mode according to the session type

  if view_notifier
    wait
    remove_view_notifer

  // Check here about the use of this flag
  if (set_read_mode &&
      exit_state_action_var == EXIT_STATE_ACTION_ABORT_SERVER) {
    abort_plugin_process(message);
  }
}

# notes 

- applier_module->add_suspension_packet();
  This prob only needed on partition handling

- logging
  Can be moved out

- read mode
  Recovery doesn't need it. so it needs an option.
  But that is a problem with the applier case where the set read mode is used as condition on abort 

- We can serialize leave on error requests to avoid multiple components doing it in parallel.
[19 Jun 2019 13:02] Margaret Fisher
Posted by developer:
 
Changelog entry added for MySQL 8.0.18:

The implementation of Group Replication's process when a member leaves the group due to an error has been standardised across components, to ensure that the member carries out exactly the same actions and issues the same error messages regardless of the original reason for leaving the group.