MySQL Bugs: #38715: slave crashed with concurrent kill/stop/start/reset

Bug #38715	slave crashed with concurrent kill/stop/start/reset
Submitted:	11 Aug 2008 10:56	Modified:	1 Sep 2009 8:20
Reporter:	Shane Bester (Platinum Quality Contributor)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.0.66a, 5.1.26	OS:	Any
Assigned to:	Andrei Elkin	CPU Architecture:	Any
Tags:	crash, KILL, RESET SLAVE, start slave, stop slave

Description:
while running stop/start slave and killing random connections on the slave, it crashed like this:

mysqld-nt.exe!my_b_append        Line 1575
mysqld-nt.exe!MYSQL_LOG::appendv Line 1595
mysqld-nt.exe!queue_event        Line 4636
mysqld-nt.exe!handle_slave_io    Line 3754
mysqld-nt.exe!pthread_start
mysqld-nt.exe!_callthreadstart
mysqld-nt.exe!_threadstart

Crash was here in my_b_append:

lock_append_buffer(info);  <----
  rest_length=(uint) (info->write_end - info->write_pos);
  if (Count <= rest_length)

I see 
#define lock_append_buffer(info) \
 pthread_mutex_lock(&(info)->append_buffer_lock)

See attached file for full stack and variable's values.

How to repeat:
have a replicating slave. in two threads execute:

stop slave
reset slave
start slave
kill <random id>

full stack trace and variables in frame. 5.0.66a

Attachment: bug38715_some_info.txt (text/plain), 3.07 KiB.

Lars, 5.1.26-rc crashed with stack trace:

mysqld.exe!handle_slave_io(void * arg=0x06623ee8)  Line 2342
mysqld.exe!pthread_start(void * param=0x06653850)  Line 85
mysqld.exe!_threadstart(void * ptd=0x06680f08)  Line 196

and then 5.1.26 crashed again with assert:

Assertion failed: mi->io_thd == thd, file .\slave.cc, line 615
mysqld-debug.exe!abort()  Line 44 + 0x7 bytes	C
mysqld-debug.exe!_assert
mysqld-debug.exe!io_slave_killed
mysqld-debug.exe!handle_slave_io+
mysqld-debug.exe!pthread_start
mysqld-debug.exe!_threadstart

I don't think the slave threads are as 'kill safe' as they could be.

to repeat:
----------------

setup a debug build master replicating to itself:

mysqld-debug  --console --skip-grant-tables --server-id=5 --log-bin --port=3306 --replicate-same-server-id  --slave-skip-errors=1050 --skip-innodb

change master to master_host='127.0.0.1', master_port=3306, master_user='root', master_password='';
start slave;

then run the attached bug38715.c testcase against the server.

testcase that will expose a few different assertions if run as described above.

Attachment: bug38715.c (text/plain), 6.49 KiB.

Shane Bester wrote:
> The following may very well all be part of the same overall bug:
>
> Bug#38240
> Bug#38715
> Bug#38716
>
> The above 3 all relate to various locking/mutex issues in the
> replication code during concurrent flush logs/stop/start
> slave/reset slave.
>
> A reason for different bug reports was slightly different crashes in
> each case.  I'm guessing the devs will eventually set them to
> duplicates and fix everything in one go.

Could not reproduce it anymore with the latest 5.0, 5.1.
Most probably the referred in comments fixed bugs were related to the current.