Bug #112990 MySQL cluster hang forever
Submitted: 7 Nov 2023 13:21 Modified: 9 Nov 2023 14:59
Reporter: zetang zeng (OCA) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Shell AdminAPI InnoDB Cluster / ReplicaSet Severity:S2 (Serious)
Version:5.7.43 OS:Any
Assigned to: MySQL Verification Team CPU Architecture:Any

[7 Nov 2023 13:21] zetang zeng
Description:
two threads hang on `pthread_join` forever in Gcs_xcom_engine::process() & `stop group_replication`, which blocks all cluster reading/write.

the thread addr which they are waiting for to `join` is not exist. I am not sure it is a bug of MySQL or glibc pthread lib?

m_thread = 0x7effd81d0100
m_thread = 0x7effd82ee950

(gdb) p * gcs_engine
...
m_engine_thread = {
    <My_xp_thread_pthread> = {<My_xp_thread> = {_vptr.My_xp_thread = 0x7effd1e2d570 <vtable for My_xp_thread_impl+16>},
      m_thread = 0x7effd81d0100, m_thread_once = 0x7effd81cfb80}, <No data fields>}, m_schedule = true}
(gdb) thread 109
[Switching to thread 109 (Thread 0x7effd16f0700 (LWP 130765))]
#0  0x00007f00e51be017 in pthread_join () from /lib64/libpthread.so.0
(gdb) frame 1
#1  0x00007effd186c4b0 in Gcs_xcom_control::do_leave (this=0x7effd82ef7c0)
    at /var/lib/pb2/sb_1-11862956-1687353311.91/mysql-5.7.43/rapid/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:722

(gdb) p *this
$2 = {<Gcs_control_interface> = {_vptr.Gcs_control_interface = 0x7effd1e2dcf0 <vtable for Gcs_xcom_control+16>},
  m_gid = 0x7effd82f0480, Python Exception <type 'exceptions.ValueError'> Cannot find type std::map<int, Gcs_control_event_listener const&, std::less<int>, std::allocator<std::pair<int const, Gcs_control_event_listener const&> > >::_Rep_type:
m_gid_hash = 1891208692, m_xcom_proxy = 0x7effd82ef070, event_listeners = std::map with 1 elements,
  m_local_member_id = 0x7effd82ef8a0, m_state_exchange = 0x7effd82f02a0, m_local_node_info = 0x7effd82ee670,
  m_xcom_thread = {<My_xp_thread_pthread> = {<My_xp_thread> = {
        _vptr.My_xp_thread = 0x7effd1e2d570 <vtable for My_xp_thread_impl+16>}, m_thread = 0x7effd82ee950,
      m_thread_once = 0x7effd82ee970}, <No data fields>}, m_node_list_me = {node_list_len = 1, node_list_val = 0x7effc000d2c0},
  m_socket_util = 0x7effd82ed230, m_join_attempts = 0, m_join_sleep_time = 5, m_xcom_running = true, m_leave_view_requested = true,
  m_leave_view_delivered = false, m_boot = false, m_initial_peers = std::vector of length 2, capacity 2 = {0x7effd82f04a0,
    0x7effd81cf330}, m_view_control = 0x7effd82ee7c0, m_gcs_engine = 0x7effd82ef550, m_xcom_management = 0x7effd82f03d0}

How to repeat:
not clear
[9 Nov 2023 14:59] MySQL Verification Team
Hi,

I am not able to reproduce this. Without a way to reproduce this I doubt there's much we can do.