Description:
The thread running init_slave() does not wait for the being started
slave threads' signalling. It unlocks LOCK_active_mi to let
the end_slave() running terminator to find the slave threads
running status as 0, i.e unchanged yet from the inititial value.
The terminator in that case goes straight to delete active_mi without any ado.
The slave threads will crash e.g at trying to lock run_lock that belong in
the deleted active_mi.
How to repeat:
There is no deterministic test in how to repeat. It can be simulated
as the following:
1. start a self-replicating server with gdb e.g
shell> gdb mysqld
(gdb) set args --console --skip-grant-tables --server-id=5 --log-bin \
--port=3306 --replicate-same-server-id --skip-name-resolve \
--slave-net-timeout=1 --net-read-timeout=1 --net-write-timeout=1
2. set br in init_slave()
3. set br in end_slave()
4. set br in handle_slave_sql()
5. run
6. finish init_slave()
don't let the slave sql thread to proceed from the 4th br point
forcing it to stay (e.g call sleep(1))
7. run mysqladmin shutdown
to get to end_slave()
finish end_slave()
8. release the slave sql thread from the 4th br point anchor.
The slave sql should come to rli->run_lock and segfault/assert.
Suggested fix:
Alas it's not possible to call start_slave_threads() with
the wait == 1 in init_slave() because of
#4 <signal handler called>
#5 0x08216ecc in THD::enter_cond (this=0x0, cond=0x892be58, mutex=0x892bde8,
msg=0x8647e18 "Waiting for slave thread to start") at sql_class.h:1653
#6 0x0838cfee in start_slave_thread (h_func=0x83886b9 <handle_slave_io>,
start_lock=0x892bde8, cond_lock=0x892bde8, start_cond=0x892be58,
slave_running=0x892dae0, slave_run_id=0x892dae4, mi=0x892b068,
high_priority=true) at slave.cc:827
#7 0x0838d7c8 in start_slave_threads (need_slave_mutex=true,
wait_for_start=true, mi=0x892b068,
master_info_fname=0x85d5e70 "master.info",
slave_info_fname=0x85d5e7c "relay-log.info", thread_mask=3) at slave.cc:876
#8 0x0838e2ab in init_slave () at slave.cc:186
Adding a new slave_status value to be set in the init_slave() running
thread might be helpful.
The end_slave() running shutdown thread respects any non-zero
slave_status to wait till it gets reset.