Bug #107635 | event scheduler cause error on group replication | ||
---|---|---|---|
Submitted: | 22 Jun 2022 14:47 | Modified: | 15 Jul 2022 19:47 |
Reporter: | lou shuai (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Group Replication | Severity: | S1 (Critical) |
Version: | 8.0.* | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | Contribution |
[22 Jun 2022 14:47]
lou shuai
[22 Jun 2022 14:53]
lou shuai
patch to fix this bug (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Bug-107635-MGR-Assertion-failure-in-event_scheduler_.patch (application/octet-stream, text), 3.86 KiB.
[23 Jun 2022 6:33]
lou shuai
analyze: trans_commit_stmt return error in recalculate_activation_times function. group_replication_trans_before_commit cause trans_commit_stmt's error. The error code is passed from the applier thread in MGR. ``` #0 0x000056386186e909 in set_transaction_ctx (transaction_termination_ctx=...) at rpl_transaction_ctx.cc:107 #1 0x00007f8e5220afad in Certification_handler::handle_transaction_id (this=..., pevent=..., cont=...) at certification_handler.cc:308 #2 0x00007f8e5220a349 in Certification_handler::handle_event (this=..., pevent=..., cont=...) at certification_handler.cc:127 #3 0x00007f8e5220982a in Event_handler::next (this=..., event=..., continuation=...) at pipeline_interfaces.h:716 #4 0x00007f8e5220f20e in Event_cataloger::handle_event (this=..., pevent=..., cont=...) at event_cataloger.cc:53 #5 0x00007f8e521b7621 in Applier_module::inject_event_into_pipeline (this=..., pevent=..., cont=...) at applier.cc:258 #6 0x00007f8e521b801b in Applier_module::apply_data_packet (this=..., data_packet=..., fde_evt=..., cont=...) at applier.cc:388 #7 0x00007f8e521b8fba in Applier_module::applier_thread_handle (this=...) at applier.cc:613 #8 0x00007f8e521b692e in launch_handler_thread (arg=...) at applier.cc:50 #9 0x00005638634d39df in pfs_spawn_thread (arg=...) at pfs.cc:2899 #10 0x00007f8e8d37efa3 in start_thread (arg=...) at pthread_create.c:486 ``` When try to find the thd in set_transaction_ctx, the daemon event scheduler thd is ignored, so can not find thd, and return ER_NO_SUCH_THREAD. ``` int set_transaction_ctx( Transaction_termination_ctx transaction_termination_ctx) { DBUG_TRACE; DBUG_PRINT("enter", ("thread_id=%lu, rollback_transaction=%d, " "generated_gtid=%d, sidno=%d, gno=%" PRId64, transaction_termination_ctx.m_thread_id, transaction_termination_ctx.m_rollback_transaction, transaction_termination_ctx.m_generated_gtid, transaction_termination_ctx.m_sidno, transaction_termination_ctx.m_gno)); uint error = ER_NO_SUCH_THREAD; Find_thd_with_id find_thd_with_id(transaction_termination_ctx.m_thread_id); THD_ptr thd_ptr = Global_THD_manager::get_instance()->find_thd(&find_thd_with_id); if (thd_ptr) { error = thd_ptr->get_transaction() ->get_rpl_transaction_ctx() ->set_rpl_transaction_ctx(transaction_termination_ctx); } return error; } bool Find_thd_with_id::operator()(THD *thd) { if (thd->get_command() == COM_DAEMON) return false; return (thd->thread_id() == m_thread_id); } ```
[23 Jun 2022 10:12]
MySQL Verification Team
Hello lou shuai, Thank you for the report and contribution. regards, Umesh
[23 Jun 2022 10:37]
lou shuai
Hi Umesh, I saw you change severity to S6, this problem not only happend in DEBUG mode. In release mode, the node will leave the MGR group, and set to read_only. So i think you should change it to a high severity
[23 Jun 2022 11:37]
MySQL Verification Team
Hello lou shuai, Ack, changed the sev. Thank you. Regards, Umesh
[15 Jul 2022 19:47]
Margaret Fisher
Posted by developer: Changelog entry added for MySQL 8.0.31: After checking a transaction commit has no conflicts and is in the correct order, Group Replication reports back to the committing session. When the event scheduler thread was started, Group Replication was not able to find the committing session, resulting in the member entering ERROR state and leaving the group. The procedure to locate the committing session was extended to find daemon threads, as used to start the event scheduler thread. Thanks to Lou Shuai for the contribution.