Description:
After bugs like
Bug#28382590: ASSERT `!IS_SET()' AT SQL_ERROR.CC:379 UPON STOP G.R. IN ERRORED MEMBER
or
Bug#28550736 RACE CONDITION ON VIEW_CHANGE_NOTIFIER
Upon looking at the proposed patches it is obvious that every code solution that deals with a leave on error situation must usually be spread across many files and components.
Also, if you look at the first bug, part of it was due to the fact that channel stop was only coded in some of these components.
How to repeat:
Look at the code
Recovery_module::leave_group_on_recovery_failure(
Group_action_coordinator::kill_transactions_and_leave(
kill_transactions_and_leave_on_election_error(
Group_partition_handling::kill_transactions_and_leave(
Applier_module::leave_group_on_failure(
Applier_module::kill_pending_transactions(
Suggested fix:
All of this operations share the same code, some have some logging that can be externalized, but overall they can all be merged
void leave_on_error(){
// Action errors might have expelled the member already
if (Group_member_info::MEMBER_ERROR ==
local_member_info->get_recovery_status()) {
return;
}
update status
notify_and_reset_ctx(ctx);
if wait_for_view
view_change_notifier.start_view_modification();
leave
stop channels
handle leave states
call kill_pending_transactions
}
void kill_pending_transactions (bool set_read_mode, bool threaded_sql_session,
Gcs_operations::enum_leave_state leave_state,
Plugin_gcs_view_modification_notifier *view_notifier,
string message) {
lock
unblock_waiting_transactions
unlock
if set_read_mode
set read mode according to the session type
if view_notifier
wait
remove_view_notifer
// Check here about the use of this flag
if (set_read_mode &&
exit_state_action_var == EXIT_STATE_ACTION_ABORT_SERVER) {
abort_plugin_process(message);
}
}
# notes
- applier_module->add_suspension_packet();
This prob only needed on partition handling
- logging
Can be moved out
- read mode
Recovery doesn't need it. so it needs an option.
But that is a problem with the applier case where the set read mode is used as condition on abort
- We can serialize leave on error requests to avoid multiple components doing it in parallel.
Description: After bugs like Bug#28382590: ASSERT `!IS_SET()' AT SQL_ERROR.CC:379 UPON STOP G.R. IN ERRORED MEMBER or Bug#28550736 RACE CONDITION ON VIEW_CHANGE_NOTIFIER Upon looking at the proposed patches it is obvious that every code solution that deals with a leave on error situation must usually be spread across many files and components. Also, if you look at the first bug, part of it was due to the fact that channel stop was only coded in some of these components. How to repeat: Look at the code Recovery_module::leave_group_on_recovery_failure( Group_action_coordinator::kill_transactions_and_leave( kill_transactions_and_leave_on_election_error( Group_partition_handling::kill_transactions_and_leave( Applier_module::leave_group_on_failure( Applier_module::kill_pending_transactions( Suggested fix: All of this operations share the same code, some have some logging that can be externalized, but overall they can all be merged void leave_on_error(){ // Action errors might have expelled the member already if (Group_member_info::MEMBER_ERROR == local_member_info->get_recovery_status()) { return; } update status notify_and_reset_ctx(ctx); if wait_for_view view_change_notifier.start_view_modification(); leave stop channels handle leave states call kill_pending_transactions } void kill_pending_transactions (bool set_read_mode, bool threaded_sql_session, Gcs_operations::enum_leave_state leave_state, Plugin_gcs_view_modification_notifier *view_notifier, string message) { lock unblock_waiting_transactions unlock if set_read_mode set read mode according to the session type if view_notifier wait remove_view_notifer // Check here about the use of this flag if (set_read_mode && exit_state_action_var == EXIT_STATE_ACTION_ABORT_SERVER) { abort_plugin_process(message); } } # notes - applier_module->add_suspension_packet(); This prob only needed on partition handling - logging Can be moved out - read mode Recovery doesn't need it. so it needs an option. But that is a problem with the applier case where the set read mode is used as condition on abort - We can serialize leave on error requests to avoid multiple components doing it in parallel.