MySQL Bugs: #72143: Crash with master-info-repository=table on out of disk space situation

Bug #72143	Crash with master-info-repository=table on out of disk space situation
Submitted:	27 Mar 2014 13:19	Modified:	25 Apr 2018 10:44
Reporter:	Daniël van Eeden (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	5.6.16	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	crash, master_info_repository

Description:
12:47:19 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=13
max_threads=800
thread_count=7
connection_count=5
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1817286 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7ef84c13b890
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ef8855bbd80 thread_stack 0x80000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x908a15]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x694934]
/lib64/libpthread.so.0[0x367020f710]
/lib64/libc.so.6[0x366fa79a33]
/lib64/libc.so.6(__libc_malloc+0x71)[0x366fa7a991]
/usr/sbin/mysqld[0x9b4c58]
/usr/sbin/mysqld[0xa42e3e]
/usr/sbin/mysqld[0xa444a1]
/usr/sbin/mysqld[0x96fd1f]
/usr/sbin/mysqld[0x979938]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjjP19Prelocking_strategy+0x659)[0x6ce569]
/usr/sbin/mysqld(_Z20open_and_lock_tablesP3THDP10TABLE_LISTbjP19Prelocking_strategy+0x44)[0x6ced84]
/usr/sbin/mysqld(_Z24open_n_lock_single_tableP3THDP10TABLE_LIST13thr_lock_typejP19Prelocking_strategy+0x2d)[0x6cef8d]
/usr/sbin/mysqld(_ZN21Rpl_info_table_access10open_tableEP3THD19st_mysql_lex_stringS2_j13thr_lock_typePP5TABLEP18Open_tables_backup+0x1b3)[0x8ef583]
/usr/sbin/mysqld(_ZN14Rpl_info_table13do_flush_infoEb+0xf0)[0x8ee830]
/usr/sbin/mysqld(_ZN11Master_info10flush_infoEb+0x91)[0x8e1e21]
/usr/sbin/mysqld(_Z17flush_master_infoP11Master_infob+0xef)[0x8d76bf]
/usr/sbin/mysqld(handle_slave_io+0xe34)[0x8e09b4]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb2c4ea]
/lib64/libpthread.so.0[0x36702079d1]
/lib64/libc.so.6(clone+0x6d)[0x366fae8b6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 1328473
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
140327 13:47:20 mysqld_safe Number of processes running now: 0

How to repeat:
5.6.16-enterprise-commercial-advanced-log on RedHat 6.3 (kernel 2.6.32-279.el6.x86_64)

Config:

# Binlogs
log_bin=/path/to/log-bin
log_bin_index=/path/to/log-bin.index
server_id=100
master_verify_checksum=ON
sync_binlog=1
log_bin_trust_function_creators=1
expire_logs_days=2
binlog_format=ROW
binlog_cache_size=65536
binlog_row_image=MINIMAL
binlog-ignore-db="my_trash_db"

# Slaves
read_only=1
log_slave_updates=1
master_info_repository=TABLE
relay_log_info_repository=TABLE
relay_log_recovery=ON
replicate-do-db="foo"
replicate-do-db="bar"
replicate-wild-ignore-table="%.foobar\_%"
report-host=myhostname
slave_net_timeout=60

Two mysql filesystems: 1 for binlogs and 1 for all other mysql files (including relaylogs and tmpdir!).

What probably happended:
- Relay logs filled the filesystem. The slave stopped.
- Some space was freed. (e.g. truncate error log)
- Start slave...and crash.

This machine receives SBR logs from its 5.5 master and writes RBR logs for it's slaves.

demangled:

/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x908a15]
/usr/sbin/mysqld(handle_fatal_signal+0x4a4)[0x694934]
/lib64/libpthread.so.0[0x367020f710]
/lib64/libc.so.6[0x366fa79a33]
/lib64/libc.so.6(__libc_malloc+0x71)[0x366fa7a991]
/usr/sbin/mysqld[0x9b4c58]
/usr/sbin/mysqld[0xa42e3e]
/usr/sbin/mysqld[0xa444a1]
/usr/sbin/mysqld[0x96fd1f]
/usr/sbin/mysqld[0x979938]
/usr/sbin/mysqld(open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*)+0x659)[0x6ce569]
/usr/sbin/mysqld(open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*)+0x44)[0x6ced84]
/usr/sbin/mysqld(open_n_lock_single_table(THD*, TABLE_LIST*, thr_lock_type, unsigned int, Prelocking_strategy*)+0x2d)[0x6cef8d]
/usr/sbin/mysqld(Rpl_info_table_access::open_table(THD*, st_mysql_lex_string, st_mysql_lex_string, unsigned int, thr_lock_type, TABLE**, Open_tables_backup*)+0x1b3)[0x8ef583]
/usr/sbin/mysqld(Rpl_info_table::do_flush_info(bool)+0xf0)[0x8ee830]
/usr/sbin/mysqld(Master_info::flush_info(bool)+0x91)[0x8e1e21]
/usr/sbin/mysqld(flush_master_info(Master_info*, bool)+0xef)[0x8d76bf]
/usr/sbin/mysqld(handle_slave_io+0xe34)[0x8e09b4]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb2c4ea]
/lib64/libpthread.so.0[0x36702079d1]
/lib64/libc.so.6(clone+0x6d)[0x366fae8b6d]

I tried to get some more info about the stacktrace.

$ nm -n /usr/sbin/mysqld > mysqld1.sym
$ nm -D -n /usr/sbin/mysqld > mysqld2.sym
$ cat mysqld.stack
0x9b4c58
0xa42e3e
0xa444a1
0x96fd1f
0x979938
0x6ce569
0x6ced84
0x6cef8d
0x8ef583
0x8ee830
0x8e1e21
0x8d76bf
0x8e09b4
0xb2c4ea
$ resolve_stack_dump -s /tmp/mysqld1.sym mysqld.stack | c++filt
0x9b4c58 mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) + 120
0xa42e3e trx_create() + 30
0xa444a1 trx_allocate_for_mysql() + 33
0x96fd1f innobase_trx_allocate(THD*) + 15
0x979938 ha_innobase::extra(ha_extra_function) + 648
0x6ce569 open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) + 1625
0x6ced84 open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) + 68
0x6cef8d open_n_lock_single_table(THD*, TABLE_LIST*, thr_lock_type, unsigned int, Prelocking_strategy*) + 45
0x8ef583 Rpl_info_table_access::open_table(THD*, st_mysql_lex_string, st_mysql_lex_string, unsigned int, thr_lock_type, TABLE**, Open_tables_backup*) + 435
0x8ee830 Rpl_info_table::do_flush_info(bool) + 240
0x8e1e21 Master_info::flush_info(bool) + 145
0x8d76bf flush_master_info(Master_info*, bool) + 239
0x8e09b4 handle_slave_io + 3636
0xb2c4ea pfs_spawn_thread + 298
$ resolve_stack_dump -s /tmp/mysqld2.sym mysqld.stack | c++filt
0x9b4c58 std::vector<unsigned char*, std::allocator<unsigned char*> >::_M_insert_aux(__gnu_cxx::__normal_iterator<unsigned char**, std::vector<unsigned char*, std::allocator<unsigned char*> > >, unsigned char* const&) + 1656
0xa42e3e void ut_list_remove<ut_list_base<trx_undo_t>, trx_undo_t>(ut_list_base<trx_undo_t>&, trx_undo_t&, unsigned long) + 12750
0xa444a1 void ut_list_remove<ut_list_base<trx_undo_t>, trx_undo_t>(ut_list_base<trx_undo_t>&, trx_undo_t&, unsigned long) + 18481
0x96fd1f ha_innobase::get_foreign_dup_key(char*, unsigned int, char*, unsigned int) + 14559
0x979938 ha_innobase::get_foreign_dup_key(char*, unsigned int, char*, unsigned int) + 54520
0x6ce569 open_tables(THD*, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) + 1625
0x6ced84 open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) + 68
0x6cef8d open_n_lock_single_table(THD*, TABLE_LIST*, thr_lock_type, unsigned int, Prelocking_strategy*) + 45
0x8ef583 Rpl_info_table_access::open_table(THD*, st_mysql_lex_string, st_mysql_lex_string, unsigned int, thr_lock_type, TABLE**, Open_tables_backup*) + 435
0x8ee830 Rpl_info_table::do_flush_info(bool) + 240
0x8e1e21 Master_info::flush_info(bool) + 145
0x8d76bf flush_master_info(Master_info*, bool) + 239
0x8e09b4 handle_slave_io + 3636
0xb2c4ea pfs_spawn_thread + 298

Any reason this bug is not verified?

Hello Daniël,

Thank you for the report and configuration file.
My apologies for missing out this report for so long. In the last 15+ days, I have attempted this 5+ times and each time I'm ending up with exact scenario i.e disk full condition on slave(as you mentioned bin log is hosted on diff partition and rest things on diff including relay logs) but still not observing the slave crash. If any chance you observe similar issue once again then please let us know and I'll give it a try again at my end. 
In the meantime I'll check internally if there was any related bug fixed in this area and would let you know.  I'm joining the worklog shortly for your reference.

Thanks,
Umesh

test results

Attachment: 72143.results (application/octet-stream, text), 20.09 KiB.