Bug #38715 slave crashed with concurrent kill/stop/start/reset
Submitted: 11 Aug 2008 10:56 Modified: 1 Sep 2009 8:20
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.0.66a, 5.1.26 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any
Tags: crash, KILL, RESET SLAVE, start slave, stop slave

[11 Aug 2008 10:56] Shane Bester
Description:
while running stop/start slave and killing random connections on the slave, it crashed like this:

mysqld-nt.exe!my_b_append        Line 1575
mysqld-nt.exe!MYSQL_LOG::appendv Line 1595
mysqld-nt.exe!queue_event        Line 4636
mysqld-nt.exe!handle_slave_io    Line 3754
mysqld-nt.exe!pthread_start
mysqld-nt.exe!_callthreadstart
mysqld-nt.exe!_threadstart

Crash was here in my_b_append:

lock_append_buffer(info);  <----
  rest_length=(uint) (info->write_end - info->write_pos);
  if (Count <= rest_length)

I see 
#define lock_append_buffer(info) \
 pthread_mutex_lock(&(info)->append_buffer_lock)

See attached file for full stack and variable's values.

How to repeat:
have a replicating slave. in two threads execute:

stop slave
reset slave
start slave
kill <random id>
[11 Aug 2008 10:58] MySQL Verification Team
full stack trace and variables in frame. 5.0.66a

Attachment: bug38715_some_info.txt (text/plain), 3.07 KiB.

[11 Aug 2008 15:38] MySQL Verification Team
Lars, 5.1.26-rc crashed with stack trace:

mysqld.exe!handle_slave_io(void * arg=0x06623ee8)  Line 2342
mysqld.exe!pthread_start(void * param=0x06653850)  Line 85
mysqld.exe!_threadstart(void * ptd=0x06680f08)  Line 196
[11 Aug 2008 15:42] MySQL Verification Team
and then 5.1.26 crashed again with assert:

Assertion failed: mi->io_thd == thd, file .\slave.cc, line 615
mysqld-debug.exe!abort()  Line 44 + 0x7 bytes	C
mysqld-debug.exe!_assert
mysqld-debug.exe!io_slave_killed
mysqld-debug.exe!handle_slave_io+
mysqld-debug.exe!pthread_start
mysqld-debug.exe!_threadstart

I don't think the slave threads are as 'kill safe' as they could be.
[20 Aug 2008 2:31] MySQL Verification Team
to repeat:
----------------

setup a debug build master replicating to itself:

mysqld-debug  --console --skip-grant-tables --server-id=5 --log-bin --port=3306 --replicate-same-server-id  --slave-skip-errors=1050 --skip-innodb

change master to master_host='127.0.0.1', master_port=3306, master_user='root', master_password='';
start slave;

then run the attached bug38715.c testcase against the server.
[20 Aug 2008 2:39] MySQL Verification Team
testcase that will expose a few different assertions if run as described above.

Attachment: bug38715.c (text/plain), 6.49 KiB.

[21 Aug 2008 10:53] Lars Thalmann
Shane Bester wrote:
> The following may very well all be part of the same overall bug:
>
> Bug#38240
> Bug#38715
> Bug#38716
>
> The above 3 all relate to various locking/mutex issues in the
> replication code during concurrent flush logs/stop/start
> slave/reset slave.
>
> A reason for different bug reports was slightly different crashes in
> each case.  I'm guessing the devs will eventually set them to
> duplicates and fix everything in one go.
[1 Sep 2009 8:20] Andrei Elkin
Could not reproduce it anymore with the latest 5.0, 5.1.
Most probably the referred in comments fixed bugs were related to the current.