Description:
We've written stress tests which run in the MTR framework. With the 5.6.7-rc source those stress tests are intermittently failing in DBUG builds because the DBUG stack frame pointers are somehow getting corrupted around the group commit and innobase_commit code. It is unclear whether this is solely a problem in the DBUG code or if there is a real problem in group commit.
Signature of the failure is:
innodb_stress.innodb_bigstress_crash_nocompress [ fail ]
Test ended at 2012-10-21 21:39:15
CURRENT_TEST: innodb_stress.innodb_bigstress_crash_nocompress
Traceback (most recent call last):
File "suite/innodb_stress/t/load_generator.py", line 252, in <module>
os.kill(server_pid, signal.SIGKILL)
OSError: [Errno 3] No such process
mysqltest: In included file "./suite/innodb_stress/include/innodb_stress.inc":
included from ./suite/innodb_stress/include/innodb_stress.inc at line 48:
At line 48: command "$exec" failed
The reason the os.kill of the mysqld process failed is because it had already exited with a signal 6. From the .err file:
/data/users/jtolmer/mysql/56/sql/mysqld: debugger aborting because missing DBUG_RETURN or DBUG_VOID_RETURN macro in function "MYSQL_BIN_LOG::change_stage"
22:26:40 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Loading the generated core in gdb shows a callstack of:
#0 0x0000003e6ce0b122 in pthread_kill () from /lib64/libpthread.so.0
#1 0x0000000000a2b872 in my_write_core (sig=6) at /home/jtolmer/mysql/56/mysys/stacktrace.c:433
#2 0x00000000006ccc8e in handle_fatal_signal (sig=6) at /home/jtolmer/mysql/56/sql/signal_handler.cc:248
#3 <signal handler called>
#4 0x0000003e6c230155 in raise () from /lib64/libc.so.6
#5 0x0000003e6c231bf0 in abort () from /lib64/libc.so.6
#6 0x0000000000d7ee48 in DbugExit (why=0x48297b70 "missing DBUG_RETURN or DBUG_VOID_RETURN macro in function \"MYSQL_BIN_LOG::change_stage\"\n") at /home/jtolmer/mysql/56/dbug/dbug.c:2127
#7 0x0000000000d7d8c5 in _db_return_ (_line_=3558, _stack_frame_=0x48297dc0) at /home/jtolmer/mysql/56/dbug/dbug.c:1287
#8 0x0000000000b0937b in innobase_commit (hton=0x1872bd0, thd=0x33e9720, commit_trx=true) at /home/jtolmer/mysql/56/storage/innobase/handler/ha_innodb.cc:3558
#9 0x00000000005d9d6a in ha_commit_low (thd=0x33e9720, all=true) at /home/jtolmer/mysql/56/sql/handler.cc:1474
#10 0x00000000009c5522 in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x1745080, thd=0x34713d0, first=0x34713d0, flush_error=0) at /home/jtolmer/mysql/56/sql/binlog.cc:6002
#11 0x00000000009c5f67 in MYSQL_BIN_LOG::ordered_commit (this=0x1745080, thd=0x34713d0, all=true, skip_commit=false) at /home/jtolmer/mysql/56/sql/binlog.cc:6318
#12 0x00000000009c4f8f in MYSQL_BIN_LOG::commit (this=0x1745080, thd=0x34713d0, all=true) at /home/jtolmer/mysql/56/sql/binlog.cc:5831
#13 0x00000000005d9be4 in ha_commit_trans (thd=0x34713d0, all=true) at /home/jtolmer/mysql/56/sql/handler.cc:1420
#14 0x0000000000836747 in trans_commit (thd=0x34713d0) at /home/jtolmer/mysql/56/sql/transaction.cc:219
#15 0x000000000077878f in mysql_execute_command (thd=0x34713d0) at /home/jtolmer/mysql/56/sql/sql_parse.cc:4171
#16 0x000000000077d2bb in mysql_parse (thd=0x34713d0, rawbuf=0x7f9e900aa460 "commit", length=6, parser_state=0x4829a8f0) at /home/jtolmer/mysql/56/sql/sql_parse.cc:6094
#17 0x0000000000770f7a in dispatch_command (command=COM_QUERY, thd=0x34713d0, packet=0x3474941 "commit", packet_length=6) at /home/jtolmer/mysql/56/sql/sql_parse.cc:1314
#18 0x00000000007701ce in do_command (thd=0x34713d0) at /home/jtolmer/mysql/56/sql/sql_parse.cc:1038
#19 0x0000000000739c62 in do_handle_one_connection (thd_arg=0x34713d0) at /home/jtolmer/mysql/56/sql/sql_connect.cc:969
#20 0x000000000073975f in handle_one_connection (arg=0x34713d0) at /home/jtolmer/mysql/56/sql/sql_connect.cc:885
#21 0x0000000000ad127c in pfs_spawn_thread (arg=0x3429f60) at /home/jtolmer/mysql/56/storage/perfschema/pfs.cc:1853
#22 0x0000003e6ce062f7 in start_thread () from /lib64/libpthread.so.0
#23 0x0000003e6c2d1e3d in clone () from /lib64/libc.so.6
#24 0x0000000000000000 in ?? ()
And poking at the linked list of stack frames shows that they are messed up at the top of the stack:
(gdb) frame 7
#7 0x0000000000d7d8c5 in _db_return_ (_line_=3558, _stack_frame_=0x48297dc0) at /home/jtolmer/mysql/56/dbug/dbug.c:1287
1287 DbugExit(buf);
(gdb) p *cs->framep
$7 = {func = 0xfadad8 "Stage_manager::Mutex_queue::append", file = 0xfad600 "/home/jtolmer/mysql/56/sql/binlog.cc", level = 2147483659, prev = 0x418b3df0}
(gdb) p *cs->framep->prev
$8 = {func = 0x7f9ebc4e6c90 "\034", file = 0x7f9ebc4e6c90 "\034", level = 2147483658, prev = 0x418b3dc0}
(gdb) p *cs->framep->prev->prev
$9 = {func = 0x7 <Address 0x7 out of bounds>, file = 0x17a5bc0 "\002", level = 3177456064, prev = 0x7f9ebc4e6b40}
But are fine lower in the stack:
(gdb) frame 11
#11 0x00000000009c5f67 in MYSQL_BIN_LOG::ordered_commit (this=0x1745080, thd=0x34713d0, all=true, skip_commit=false) at /home/jtolmer/mysql/56/sql/binlog.cc:6318
6318 process_commit_stage_queue(thd, commit_queue, flush_error);
(gdb) p _db_stack_frame_
$25 = {func = 0xfb016a "MYSQL_BIN_LOG::commit", file = 0xfad600 "/home/jtolmer/mysql/56/sql/binlog.cc", level = 2147483656, prev = 0x48298dc0}
(gdb) p *_db_stack_frame_->prev
$26 = {func = 0xe28629 "ha_commit_trans", file = 0xe28150 "/home/jtolmer/mysql/56/sql/handler.cc", level = 2147483655, prev = 0x48299000}
(gdb) p *_db_stack_frame_->prev->prev
$27 = {func = 0xf04ca7 "trans_commit", file = 0xf04b78 "/home/jtolmer/mysql/56/sql/transaction.cc", level = 2147483654, prev = 0x48299280}
(gdb) p *_db_stack_frame_->prev->prev->prev
$28 = {func = 0xedb62a "mysql_execute_command", file = 0xedaf90 "/home/jtolmer/mysql/56/sql/sql_parse.cc", level = 2147483653, prev = 0x48299ed0}
(gdb) p *_db_stack_frame_->prev->prev->prev->prev
$29 = {func = 0xedc2ef "mysql_parse", file = 0xedaf90 "/home/jtolmer/mysql/56/sql/sql_parse.cc", level = 2147483652, prev = 0x4829a1b0}
(gdb) p *_db_stack_frame_->prev->prev->prev->prev->prev
$30 = {func = 0xedb1c9 "dispatch_command", file = 0xedaf90 "/home/jtolmer/mysql/56/sql/sql_parse.cc", level = 2147483651, prev = 0x4829ae50}
(gdb) p *_db_stack_frame_->prev->prev->prev->prev->prev->prev
$31 = {func = 0xedb0f5 "do_command", file = 0xedaf90 "/home/jtolmer/mysql/56/sql/sql_parse.cc", level = 2147483650, prev = 0x4829aff0}
How to repeat:
apply patch which only contains only changes for innodb_bigstress_crash_nocompress.test
/path/to/cmake -DCMAKE_BUILD_TYPE=Debug .
make
cd mysql-test/
./mtr --big --repeat=16 --parallel=8 --force innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress innodb_stress.innodb_bigstress_crash_nocompress
This usually, but not always, hits the failure for me.