Description:
Around sql/log_event.cc#apply_event() line 3070:
-----------
err:
if (thd->is_error())
{
DBUG_ASSERT(!worker);
// destroy buffered events of the current group prior to exit
for (uint k= 0; k < rli->curr_group_da.elements; k++)
{
delete *(Log_event**) dynamic_array_ptr(&rli->curr_group_da, k);
}
}
-----------
rli->curr_group_da.elements is not reset here, so if entering the
same delete loop, mysqld does double free, which will result in SIGSEGV.
How to repeat:
On my test environment, I encountered crash loop by the following steps.
1. One master and one slave, Multi-Threaded slave enabled, 100 databases and 100 workers
2. Running heavy concurrent inserts on master
3. killing -9 slave during loads
4. Restarting mysqld
5. start slave (keeps crashing)
Here is a core dump of the crashed slave.
Program terminated with signal 11, Segmentation fault.
#0 0x0000003b1280b122 in pthread_kill () from /lib64/libpthread.so.0
#1 0x000000000066a70a in handle_fatal_signal (sig=11)
at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/signal_handler.cc:248
#2 <signal handler called>
#3 0x00000000008b9ea3 in slave_stop_workers (rli=0x7f1594069c50,
mts_inited=0x4099d0ef)
at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/rpl_slave.cc:5256
#4 0x00000000008c6889 in handle_slave_sql (arg=<optimized out>)
at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/rpl_slave.cc:5592
#5 0x0000003b128062f7 in start_thread () from /lib64/libpthread.so.0
#6 0x0000003b120d1e3d in clone () from /lib64/libc.so.6
#7 0x0000000000000000 in ?? ()
On rpl_slave.cc:5256, there is another delete loop.
----
for (uint i= 0; i < rli->curr_group_da.elements; i++)
delete *(Log_event**) dynamic_array_ptr(&rli->curr_group_da, i);
delete_dynamic(&rli->curr_group_da); // GCDA
----
I'm confident that rli->curr_group_da[i] was already deleted on log_event.cc#apply_event() (verified by debugger), so this is SIGSEGV caused by double free.
By adding below line after apply_event() delete loop on log_event.cc, the crash loop went away.
---
delete_dynamic(&rli->curr_group_da);
---