Description:
After bugs like
Bug#28382590: ASSERT `!IS_SET()' AT SQL_ERROR.CC:379 UPON STOP G.R. IN ERRORED MEMBER
or
Bug#28550736 RACE CONDITION ON VIEW_CHANGE_NOTIFIER
Upon looking at the proposed patches it is obvious that every code solution that deals with a leave on error situation must usually be spread across many files and components.
Also, if you look at the first bug, part of it was due to the fact that channel stop was only coded in some of these components.
How to repeat:
Look at the code
Recovery_module::leave_group_on_recovery_failure(
Group_action_coordinator::kill_transactions_and_leave(
kill_transactions_and_leave_on_election_error(
Group_partition_handling::kill_transactions_and_leave(
Applier_module::leave_group_on_failure(
Applier_module::kill_pending_transactions(
Suggested fix:
All of this operations share the same code, some have some logging that can be externalized, but overall they can all be merged
void leave_on_error(){
// Action errors might have expelled the member already
if (Group_member_info::MEMBER_ERROR ==
local_member_info->get_recovery_status()) {
return;
}
update status
notify_and_reset_ctx(ctx);
if wait_for_view
view_change_notifier.start_view_modification();
leave
stop channels
handle leave states
call kill_pending_transactions
}
void kill_pending_transactions (bool set_read_mode, bool threaded_sql_session,
Gcs_operations::enum_leave_state leave_state,
Plugin_gcs_view_modification_notifier *view_notifier,
string message) {
lock
unblock_waiting_transactions
unlock
if set_read_mode
set read mode according to the session type
if view_notifier
wait
remove_view_notifer
// Check here about the use of this flag
if (set_read_mode &&
exit_state_action_var == EXIT_STATE_ACTION_ABORT_SERVER) {
abort_plugin_process(message);
}
}
# notes
- applier_module->add_suspension_packet();
This prob only needed on partition handling
- logging
Can be moved out
- read mode
Recovery doesn't need it. so it needs an option.
But that is a problem with the applier case where the set read mode is used as condition on abort
- We can serialize leave on error requests to avoid multiple components doing it in parallel.