Description:
The NDB_SHARE object may be referred by multiple threads and multiple data structures
shared between these threads. A reference counting schema is used to keep track of
number of references to the NDB_SHARE, such that it is not destructed while still
being referred from somewhere. To avoid keeping locks while updating a 'share',
a temporary ref count may also be taken on the share to avoid that it is destructed
in the middle of a set of operations (typical while the binlog thread handle distribution
of schema changes)
One of the data structures referring a 'share', is the hash-list: 'ndbcluster_open_tables'.
A share is registered here as soon as it is created or 'discovered' by the first operation
opening a table referring this share. The 'state' of the share has to be '!= NSS_DROPPED'
while it is registered in the 'open_tables' hash.
When the binlog thread restarts, it will empty the 'open_tables' hash to force
a later reopen of these tables. Thus, the shares in this tables are unrefed, and
possibly destructed.
However, a possible race condition exists between a client thread creating or
opening a share, and the binlog thread emptying the 'open_tables' list. As the
::open and ::create() methods did not have their own (temporary) reference
to the share they operated on, the binlog thread could destruct the share
in the middle of their operation. The stack dump below is one such case:
current thread: t@27
[1] _lwp_kill(0x1b, 0xb, 0x0, 0xfe522168), at 0xfe529265
[2] thr_kill(0x1b, 0xb, 0xfe528365, 0x14, 0x1708e9c, 0x9011efc), at 0xfe52218a
=>[3] my_write_core(sig = 0), line 422 in "stacktrace.c"
[4] handle_fatal_signal(sig = 0), line 230 in "signal_handler.cc"
[5] __sighndlr(0xb, 0x0, 0xfcb04d20, 0x84f9380), at 0xfe5247d5
[6] call_user_handler(0xb), at 0xfe517dae
[7] sigacthandler(0xb, 0x0, 0xfcb04d20, 0x1f, 0x0, 0x807007), at 0xfe518007
---- called from signal handler with signal 11 (SIGSEGV) ------
-> While ::open() sets up the event subscription on the share, the share
itself is destructed under our feets -> SIGSEGV
[8] strcmp(0x0, 0x8f86b50, 0xfcb052c8, 0x8c4e13e, 0x10291660, 0x8000000f), at 0xfe49c9d3
[9] Ndb_dist_priv_util::is_distributed_priv_table(db = (nil), table = (nil)), line 54 in "ndb_dist_priv_util.h"
[10] ndbcluster_create_event_ops(thd = (nil), share = (nil), ndbtab = (nil), event_name = (nil)), line 4898 in "ha_ndbcluster_binlog.cc"
[11] ndbcluster_create_binlog_setup(thd = (nil), ndb = (nil), key = (nil), key_len = 0, db = (nil), table_name = (nil), table = (nil)), line 4679 in "ha_ndbcluster_binlog.cc"
[12] ha_ndbcluster::open(this = (nil), name = (nil), mode = 0, test_if_locked = 0), line 11819 in "ha_ndbcluster.cc"
[13] handler::ha_open(this = (nil), table_arg = (nil), name = (nil), mode = 0, test_if_locked = 0), line 2505 in "handler.cc"
[14] open_table_from_share(thd = (nil), share = (nil), alias = (nil), db_stat = 0, prgflag = 0, ha_open_flags = 0, outparam = (nil), is_create_table = false), line 2355 in "table.cc"
[15] open_table(thd = (nil), table_list = (nil), ot_ctx = (nil)), line 3167 in "sql_base.cc"
[16] _ZL22open_and_process_tableP3THDP3LEXP10TABLE_LISTPjjP19Prelocking_strategybP18Open_table_context(thd = (nil), lex = (nil), tables = (nil), counter = (nil), flags = 0, prelocking_strategy = (nil), has_prelocking_list = false, ot_ctx = (nil)), line 4726 in "sql_base.cc"
[17] open_tables(thd = (nil), start = (nil), counter = (nil), flags = 0, prelocking_strategy = (nil)), line 5159 in "sql_base.cc"
[18] open_normal_and_derived_tables(thd = (nil), tables = (nil), flags = 0), line 5856 in "sql_base.cc"
[19] mysql_insert(thd = (nil), table_list = (nil), fields = CLASS, values_list = CLASS, update_fields = CLASS, update_values = CLASS, duplic = DUP_ERROR, ignore = false), line 737 in "sql_insert.cc"
[20] mysql_execute_command(thd = (nil)), line 3457 in "sql_parse.cc"
[21] mysql_parse(thd = (nil), rawbuf = (nil), length = 0, parser_state = (nil)), line 6393 in "sql_parse.cc"
[22] dispatch_command(command = COM_SLEEP, thd = (nil), packet = (nil), packet_length = 0), line 1340 in "sql_parse.cc"
[23] do_command(thd = (nil)), line 1037 in "sql_parse.cc"
[24] do_handle_one_connection(thd_arg = (nil)), line 982 in "sql_connect.cc"
[25] handle_one_connection(arg = (nil)), line 898 in "sql_connect.cc"
[26] pfs_spawn_thread(arg = (nil)), line 1860 in "pfs.cc"
[27] _thrp_setup(0xfe2ec240), at 0xfe52444c
[28] _lwp_start(0x1b, 0xb, 0x0, 0xfe522168, 0xfcb04bd8, 0x9011efc), at 0xfe5246f0
Below, we find the binlog thread in its initial steps, waiting for the first
events to arrive. Immediately before arriving at this place, it has unref'ed
all shares in the 'open_tables'
current thread: t@23
=>[1] __lwp_park(0x0, 0xfcbce838, 0x0, 0x0), at 0xfe524749
[2] cond_sleep_queue(0x9531494, 0x953147c), at 0xfe51de00
[3] cond_wait_queue(0x9531494, 0x953147c, 0xfcbce838, 0xfe51e32b), at 0xfe51e0e3
[4] cond_wait_common(0x9531494, 0x953147c, 0xfcbce838, 0xfe51e749), at 0xfe51e501
[5] __cond_timedwait(0x9531494, 0x953147c, 0xfcbcee6c, 0xfe51e834), at 0xfe51e7a1
[6] cond_timedwait(0x9531494, 0x953147c, 0xfcbcee6c, 0xfe51e87b), at 0xfe51e845
[7] pthread_cond_timedwait(0x9531494, 0x953147c, 0xfcbcee6c, 0x0, 0x12, 0x0), at 0xfe51e893
[8] Ndb_binlog_thread::do_run(this = (nil)), line 6452 in "ha_ndbcluster_binlog.cc"
[9] Ndb_component::run_impl(this = (nil)), line 93 in "ndb_component.cc"
[10] Ndb_component_run_C(arg = (nil)), line 52 in "ndb_component.cc"
[11] _thrp_setup(0xfe2ea240), at 0xfe52444c
[12] _lwp_start(0x0, 0xfcbce838, 0x0, 0x0, 0xfe51dd04, 0x1), at 0xfe5246f0
How to repeat:
Run the new testcase 'ndb_binlog_restart.test' which will be pushed as part of
fix for bug#22204186.