Bug #92875 Crash after member tries to recover
Submitted: 21 Oct 2018 19:37 Modified: 24 Nov 2018 14:27
Reporter: Geert Vanderkelen Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S2 (Serious)
Version:8.0.11 OS:Debian (9)
Assigned to: CPU Architecture:Any

[21 Oct 2018 19:37] Geert Vanderkelen
Description:
Hi guys,

(This bug was reported as #92450, but it was closed by the reporter. Since it was verified, and can be easily reproduced, it is important that this bug report is kept open. Hence this report.)

The following crash was observed after a verified networking issue in the OpenStack setup (a malfunctioning route). The MySQL member apparently saw this, handled it, recovered, but then crashed (pretty much in the same second). Maybe race-condition?

We are using:
* MySQL 8.0.12
* Official binaries for Debian 9 (installed from MySQL APT repositories)

As far as I can tell, this is now the 4th time this happened.

Your pal,
Geert

```
2018-09-30T21:28:44.166190Z 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address ..2-fb5uwcb46fcq:3306 is reachable again.'
2018-09-30T21:28:44.166289Z 0 [Warning] [MY-011494] [Repl] Plugin group_replication reported: 'Member with address ..1-egtbu7j4kxnp:3306 is reachable again.'
2018-09-30T21:28:44.166308Z 0 [Warning] [MY-011498] [Repl] Plugin group_replication reported: 'The member has resumed contact with a majority of the members in the group. Regular operation is restored and transactions are unblocked.'
2018-09-30T21:28:44.206274Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2018-09-30T21:28:44.208220Z 0 [ERROR] [MY-013173] [Repl] Plugin group_replication reported: 'The plugin encountered a critical error and will abort: Fatal error during execution of Group Replication'

```

```
stack_bottom = 0 thread_stack 0x46000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x2e) [0x5595c77c526e]
/usr/sbin/mysqld(handle_fatal_signal+0x4c1) [0x5595c6aa0e21]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f9e8a74e0c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf) [0x7f9e889e3fff]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f9e889e542a]
/usr/lib/mysql/plugin/group_replication.so(abort_plugin_process(char const*)+0x18e) [0x7f9e556f988e]
/usr/lib/mysql/plugin/group_replication.so(Applier_module::kill_pending_transactions(bool, bool)+0x5d4) [0x7f9e556bcd64]
/usr/lib/mysql/plugin/group_replication.so(Plugin_gcs_events_handler::was_member_expelled_from_group(Gcs_view const&) const+0x357) [0x7f9e556cf7a7]
/usr/lib/mysql/plugin/group_replication.so(Plugin_gcs_events_handler::on_view_changed(Gcs_view const&, std::vector<std::pair<Gcs_member_identifier*, Gcs_message_data*>, std::allocator<std::pair<Gcs_member_identifier*, Gcs_message_data*> > > const&) const+0x9f) [0x7f9e556d760f]
/usr/lib/mysql/plugin/group_replication.so(Gcs_xcom_control::install_view(Gcs_xcom_view_identifier*, Gcs_group_identifier const&, std::map<Gcs_member_identifier, Xcom_member_state*, std::less<Gcs_member_identifier>, std::allocator<std::pair<Gcs_member_identifier const, Xcom_member_state*> > >*, std::set<Gcs_member_identifier*, std::less<Gcs_member_identifier*>, std::allocator<Gcs_member_identifier*> >*, std::set<Gcs_member_identifier*, std::less<Gcs_member_identifier*>, std::allocator<Gcs_member_identifier/usr/lib/mysql/plugin/group_replication.so(Gcs_xcom_control::install_leave_view(Gcs_view::Gcs_view_error_code)+0x308) [0x7f9e5575fb38]
/usr/lib/mysql/plugin/group_replication.so(Gcs_xcom_control::xcom_receive_global_view(synode_no, Gcs_xcom_nodes*, bool)+0x60c) [0x7f9e55763d8c]
/usr/lib/mysql/plugin/group_replication.so(do_cb_xcom_receive_global_view(synode_no, synode_no, Gcs_xcom_nodes*)+0xc2) [0x7f9e55726142]
/usr/lib/mysql/plugin/group_replication.so(Global_view_notification::do_execute()+0x20) [0x7f9e5572fac0]
/usr/lib/mysql/plugin/group_replication.so(Parameterized_notification<false>::operator()()+0xa) [0x7f9e5572fb4a]
/usr/lib/mysql/plugin/group_replication.so(Gcs_xcom_engine::process()+0x97) [0x7f9e5572ff07]
/usr/lib/mysql/plugin/group_replication.so(process_notification_thread(void*)+0x9) [0x7f9e55730169]
/usr/sbin/mysqld(+0x1ee820f) [0x5595c7c5920f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f9e8a744494]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f9e88a99acf]

```

Maybe relevant, bug conservatively boring config:

```
[mysqld]
transaction_isolation = READ-COMMITTED

server_id = 1427
log_bin = binlog
binlog_expire_logs_seconds = 864000  # 10 days
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_checksum = NONE
binlog_rows_query_log_events = ON
log_bin_trust_function_creators = ON

# Replication
relay_log = relaylog
relay_log_recovery = ON
log_slave_updates = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
slave_parallel_type = LOGICAL_CLOCK
slave_parallel_workers = 4
slave_preserve_commit_order = 1
slave_type_conversions = ALL_NON_LOSSY
disabled_storage_engines = "MyISAM,FEDERATED,ARCHIVE,BLACKHOLE"

# Group Replication
plugin_load=group_replication.so
group_replication_start_on_boot = OFF
group_replication_bootstrap_group = OFF
group_replication_ip_whitelist = "localhost,***"
group_replication_group_name = 2b2fa659-22c7-4cfd-910b-c5ee96fc29e9
group_replication_local_address = ****
group_replication_group_seeds = "**3members*"
```

How to repeat:
I can easily reproduce this with VirtualBox machines:

1) Run 3 mysqld in VirtualBox, and configure MySQL Replication so it is nicely running.
2) Get the primary, for example host 'mysqld1' is primary, 'mysqld2' a secondary.
3) "Unplug" network for both a secondary, and the primary:

$ VBoxManage controlvm mysqld1 setlinkstate2 off ; VBoxManage controlvm mysqld2 setlinkstate2 off

Pretty much consistently crashes MySQL with the above trace.
[22 Oct 2018 6:07] Nuno Carvalho
Hi Geert,

Can you please double check the server version?
The stack you present is from 8.0.12 and not 8.0.11.

The server is aborting as expected, the option group_replication_exit_state_action default is ABORT_SERVER, in this case, the member is being expelled, as such the server is aborted to avoid stale reads.

The error message is explicit:
"Plugin group_replication reported: 'The plugin encountered a critical error and will abort: Fatal error during execution of Group Replication'"

https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replic...
https://dev.mysql.com/worklog/task/?id=11568

Best regards,
Nuno Carvalho
[22 Oct 2018 6:14] Geert Vanderkelen
Hi Nuno, we have seen it with 8.0.11 and 8.0.12. So it is not new.

The error message is fine. The stacktrace is not. In Production this raises eyebrows :)

Cheers,
Geerr
[22 Oct 2018 6:16] Geert Vanderkelen
But group_replication_exit_state_action is interesting. A default doing an abort: just an error is good enough.
[23 Oct 2018 16:47] Bogdan Kecman
Hi Geert,

This one is verified too.

all best
bogdan
[24 Oct 2018 14:27] Nuno Carvalho
Hi Geert,

Like I said before, the server is aborting as expected on 8.0.12, the option group_replication_exit_state_action default is ABORT_SERVER, in this case, the member is being expelled, as such the server is aborted to avoid stale reads.

Whatever you saw on *8.0.11* is a different thing and that should be a bug.
Can you please update the bug report with the 8.0.11 stacktrace.

Best regards,
Nuno Carvalho
[25 Nov 2018 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".