Bug #80581 rpl_semi_sync_[non_]group_commit_deadlock crash on ASan, debug
Submitted: 2 Mar 2016 8:03 Modified: 4 Mar 2016 9:47
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S1 (Critical)
Version:5.7.11 OS:Any
Assigned to: CPU Architecture:Any
Tags: asan, mtr, semisync, valgrind

[2 Mar 2016 8:03] Laurynas Biveinis
Description:
ASan:

rpl.rpl_semi_sync_group_commit_deadlock 'stmt' w3 [ fail ]
...
CURRENT_TEST: rpl.rpl_semi_sync_group_commit_deadlock
mysqltest: In included file ./extra/rpl_tests/rpl_stress_test.inc at line 68:
included from ./extra/rpl_tests/rpl_semi_sync_group_commit_deadlock.inc at line 22:
included from /Users/laurynas/percona/mysql-5.7.11/mysql-test/suite/rpl/t/rpl_semi_sync_group_commit_deadlock.test at line 25:
At line 68: query 'reap' failed: 2013: Lost connection to MySQL server during query
...
ASAN:SIGSEGV
ASAN:SIGSEGV
=================================================================
==4336==AddressSanitizer: while reporting a bug found another one. Ignoring.
==4336==ERROR: AddressSanitizer: SEGV on unknown address 0x000100000018 (pc 0x000105fd5100 bp 0x700000d5cbb0 sp 0x700000d5cb50 T31)
    #0 0x105fd50ff in pfs_start_mutex_wait_v1 pfs.cc:2640
    #1 0x1054d69e1 in inline_mysql_mutex_lock(st_mysql_mutex*, char const*, unsigned int) mysql_thread.h:707
    #2 0x1054d73f9 in plugin_unlock(THD*, st_plugin_int**) sql_plugin.cc:1143
    #3 0x1052c7670 in Binlog_transmit_delegate::reserve_header(THD*, unsigned short, String*) rpl_handler.cc:709
    #4 0x1057ecfde in Binlog_sender::reset_transmit_packet(unsigned short, unsigned long) rpl_binlog_sender.cc:906
    #5 0x1057ebef3 in Binlog_sender::read_event(st_io_cache*, binary_log::enum_binlog_checksum_alg, unsigned char**, unsigned int*) rpl_binlog_sender.cc:1056
    #6 0x1057ea0e7 in Binlog_sender::send_events(st_io_cache*, unsigned long long) rpl_binlog_sender.cc:390
    #7 0x1057e90b2 in Binlog_sender::send_binlog(st_io_cache*, unsigned long long) rpl_binlog_sender.cc:317
    #8 0x1057e86a3 in Binlog_sender::run() rpl_binlog_sender.cc:225
    #9 0x1057e4156 in mysql_binlog_send(THD*, char*, unsigned long long, Gtid_set*, unsigned int) rpl_master.cc:415
    #10 0x1057e3d42 in com_binlog_dump(THD*, char*, unsigned long) rpl_master.cc:342
    #11 0x10547c115 in dispatch_command(THD*, COM_DATA const*, enum_server_command) sql_parse.cc:1647
    #12 0x10547e695 in do_command(THD*) sql_parse.cc:996
    #13 0x1056b8cdb in handle_connection connection_handler_per_thread.cc:301
    #14 0x105fd3490 in pfs_spawn_thread pfs.cc:2192
    #15 0x7fff9ec80c12 in _pthread_body (/usr/lib/system/libsystem_pthread.dylib+0x3c12)
    #16 0x7fff9ec80b8f in _pthread_start (/usr/lib/system/libsystem_pthread.dylib+0x3b8f)
    #17 0x7fff9ec7e374 in thread_start (/usr/lib/system/libsystem_pthread.dylib+0x1374)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV pfs.cc:2640 pfs_start_mutex_wait_v1
Thread T31 created by T0 here:
    #0 0x10907fc89 in wrap_pthread_create (/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.0.2/lib/darwin/libclang_rt.asan_osx_dynamic.dylib+0x37c89)
    #1 0x105fd37f8 in pfs_spawn_thread_v1 pfs.cc:2243
    #2 0x1056b9804 in Per_thread_connection_handler::add_connection(Channel_info*) connection_handler_per_thread.cc:404
    #3 0x104a03d66 in Connection_handler_manager::process_new_connection(Channel_info*) connection_handler_manager.cc:268
    #4 0x1056f963c in Connection_acceptor<Mysqld_socket_listener>::connection_event_loop() connection_acceptor.h:68
    #5 0x1056eadf7 in mysqld_main(int, char**) mysqld.cc:4941
    #6 0x7fff8f43f5ac in start (/usr/lib/system/libdyld.dylib+0x35ac)
    #7 0x9  (<unknown module>)

Debug build (that also has -DWITH_VALGRIND=ON but not running under Valgrind):
$ ./mtr --debug-server --big-test --parallel=3 --force --max-test-fail=0 rpl_semi_sync_group_commit_deadlock
...
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
worker[2] Using MTR_BUILD_THREAD 301, with reserved ports 13010..13019
worker[3] Using MTR_BUILD_THREAD 302, with reserved ports 13020..13029
worker[2] Trying to dump core for [mysqltest - pid: 4395, winpid: 4395]
worker[2] Trying to dump core for [mysqld.1 - pid: 4228, winpid: 4228]
worker[3] Trying to dump core for [mysqltest - pid: 4398, winpid: 4398]
worker[1] Trying to dump core for [mysqltest - pid: 4402, winpid: 4402]
worker[3] Trying to dump core for [mysqld.1 - pid: 4229, winpid: 4229]
worker[1] Trying to dump core for [mysqld.1 - pid: 4227, winpid: 4227]
worker[3] Trying to dump core for [mysqld.2 - pid: 4237, winpid: 4237]
worker[1] Trying to dump core for [mysqld.2 - pid: 4234, winpid: 4234]
worker[2] Trying to dump core for [mysqld.2 - pid: 4233, winpid: 4233]
rpl.rpl_semi_sync_group_commit_deadlock 'row' w3 [ fail ]  timeout after 900 seconds
...

How to repeat:
ASan under Mac OS X 10.11.3, Apple LLVM version 7.0.2 (clang-700.1.81):

cmake ../mysql-5.7.11/ -DBUILD_CONFIG=mysql_release -DWITH_DEBUG=ON -DWITH_ASAN=ON -DWITH_BOOST=~/percona/mysql-boost

./mtr --debug-server --big-test --parallel=3 rpl_semi_sync_group_commit_deadlock

Debug build under Ubuntu Wily x86_64:

cmake ../mysql-server/ -DBUILD_CONFIG=mysql_release -DWITH_DEBUG=ON -DWITH_VALGRIND=ON -DWITH_BOOST=~/mysql-boost

./mtr --debug-server --big-test --parallel=3 rpl_semi_sync_group_commit_deadlock
[2 Mar 2016 8:30] Laurynas Biveinis
The non-ASan run could be a timeout, not a hang, but in previous passing attempts I see that this testcase completes rather quickly:

rpl.rpl_semi_sync_group_commit_deadlock 'mix' w4 [ pass ]  14412
rpl.rpl_semi_sync_group_commit_deadlock 'row' w4 [ pass ]  14045
rpl.rpl_semi_sync_group_commit_deadlock 'stmt' w2 [ pass ]  14144
[2 Mar 2016 8:31] Laurynas Biveinis
Debug build hangs on Mac (without -DBUILD_VALGRIND=ON CMake option) too.
[2 Mar 2016 8:32] Laurynas Biveinis
Same stacktrace with ASan on rpl_semi_sync_non_group_commit_deadlock
[2 Mar 2016 8:44] Laurynas Biveinis
Got one crashing stacktrace with debug:

/mtr --debug-server --big-test --parallel=3 rpl_semi_sync_non_group_commit_deadlock
...
rpl.rpl_semi_sync_non_group_commit_deadlock 'mix' w3 [ fail ]
        Test ended at 2016-03-02 10:39:49

CURRENT_TEST: rpl.rpl_semi_sync_non_group_commit_deadlock
mysqltest: In included file ./include/stop_dump_threads.inc at line 27:
included from ./include/uninstall_semisync.inc at line 18:
included from /Users/laurynas/percona/mysql-5.7.11/mysql-test/suite/rpl/t/rpl_semi_sync_non_group_commit_deadlock.test at line 32:
At line 26: query 'let $_sdt_dump_thread_id= `SELECT ID FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Binlog dump' OR COMMAND = 'Binlog DUMP GTID'`' failed: 2013: Lost connection to MySQL server during query
...
08:39:37 UTC - mysqld got signal 11 ;
...
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
0   mysqld-debug                        0x000000010c6f4c68 my_print_stacktrace + 72
1   mysqld-debug                        0x000000010c59ffc2 handle_fatal_signal + 930
2   libsystem_platform.dylib            0x00007fff9b28deaa _sigtramp + 26
3   libsystem_pthread.dylib             0x00007fff9ec7ee7f pthread_rwlock_unlock + 367
4   mysqld-debug                        0x000000010cd3cca5 _Z16lf_hash_put_pinsP10st_lf_pins + 21
5   mysqld-debug                        0x000000010cd3cabc _Z14destroy_threadP10PFS_thread + 428
6   mysqld-debug                        0x000000010cd1831d pfs_delete_current_thread_v1 + 93
7   mysqld-debug                        0x000000010c55d40b handle_connection + 635
8   mysqld-debug                        0x000000010cd172db pfs_spawn_thread + 411
9   libsystem_pthread.dylib             0x00007fff9ec80c13 _pthread_body + 131
10  libsystem_pthread.dylib             0x00007fff9ec80b90 _pthread_body + 0
11  libsystem_pthread.dylib             0x00007fff9ec7e375 thread_start + 13
[4 Mar 2016 9:47] MySQL Verification Team
Hello Laurynas,

Thank you for the report.
Verified as described with 5.7.11 on Mac OSX(LLVM Version 7.0.2(clang-700.1.81)).

Thanks,
Umesh