Bug #38715 | slave crashed with concurrent kill/stop/start/reset | ||
---|---|---|---|
Submitted: | 11 Aug 2008 10:56 | Modified: | 1 Sep 2009 8:20 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.0.66a, 5.1.26 | OS: | Any |
Assigned to: | Andrei Elkin | CPU Architecture: | Any |
Tags: | crash, KILL, RESET SLAVE, start slave, stop slave |
[11 Aug 2008 10:56]
Shane Bester
[11 Aug 2008 10:58]
MySQL Verification Team
full stack trace and variables in frame. 5.0.66a
Attachment: bug38715_some_info.txt (text/plain), 3.07 KiB.
[11 Aug 2008 15:38]
MySQL Verification Team
Lars, 5.1.26-rc crashed with stack trace: mysqld.exe!handle_slave_io(void * arg=0x06623ee8) Line 2342 mysqld.exe!pthread_start(void * param=0x06653850) Line 85 mysqld.exe!_threadstart(void * ptd=0x06680f08) Line 196
[11 Aug 2008 15:42]
MySQL Verification Team
and then 5.1.26 crashed again with assert: Assertion failed: mi->io_thd == thd, file .\slave.cc, line 615 mysqld-debug.exe!abort() Line 44 + 0x7 bytes C mysqld-debug.exe!_assert mysqld-debug.exe!io_slave_killed mysqld-debug.exe!handle_slave_io+ mysqld-debug.exe!pthread_start mysqld-debug.exe!_threadstart I don't think the slave threads are as 'kill safe' as they could be.
[20 Aug 2008 2:31]
MySQL Verification Team
to repeat: ---------------- setup a debug build master replicating to itself: mysqld-debug --console --skip-grant-tables --server-id=5 --log-bin --port=3306 --replicate-same-server-id --slave-skip-errors=1050 --skip-innodb change master to master_host='127.0.0.1', master_port=3306, master_user='root', master_password=''; start slave; then run the attached bug38715.c testcase against the server.
[20 Aug 2008 2:39]
MySQL Verification Team
testcase that will expose a few different assertions if run as described above.
Attachment: bug38715.c (text/plain), 6.49 KiB.
[21 Aug 2008 10:53]
Lars Thalmann
Shane Bester wrote: > The following may very well all be part of the same overall bug: > > Bug#38240 > Bug#38715 > Bug#38716 > > The above 3 all relate to various locking/mutex issues in the > replication code during concurrent flush logs/stop/start > slave/reset slave. > > A reason for different bug reports was slightly different crashes in > each case. I'm guessing the devs will eventually set them to > duplicates and fix everything in one go.
[1 Sep 2009 8:20]
Andrei Elkin
Could not reproduce it anymore with the latest 5.0, 5.1. Most probably the referred in comments fixed bugs were related to the current.