MySQL Bugs: #81033: Crash on reset master

Bug #81033	Crash on reset master
Submitted:	11 Apr 2016 15:43	Modified:	22 Nov 2019 19:03
Reporter:	Daniël van Eeden (OCA)	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.17, 5.6.27, 5.6.30, 5.7.27	OS:	Any
Assigned to:	Venkatesh Duggirala	CPU Architecture:	Any
Tags:	crash

Description:
This happened on 5.6.17 and then on 5.6.27 after I upgraded MySQL.

How to repeat:
mysql> reset master;
2016-04-11 17:19:25 3135 [ERROR] Failed to locate old binlog or relay log files
ERROR 1373 (HY000): Target log not found in binlog index
root@mc202hostname-01 [(none)]> 2016-04-11 17:19:25 3135 [ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'IGNORE_ERROR'. Hence turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
15:19:25 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=805306368
read_buffer_size=131072
max_used_connections=6
max_threads=3000
thread_count=5
connection_count=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4280822 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8ddb85]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x665b44]
/lib64/libpthread.so.0(+0xf710)[0x7f4dc7c61710]
/lib64/libc.so.6(+0x79252)[0x7f4dc674a252]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7f4dc674b6b1]
/usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f4dc6fbc09d]
/usr/sbin/mysqld(_Z26handle_connections_socketsv+0x39a)[0x599e1a]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0xd50)[0x5a19b0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f4dc66efd5d]
/usr/sbin/mysqld[0x592f65]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160411 17:19:25 mysqld_safe Number of processes running now: 0

From c++filt:
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8ddb85]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x665b44]
/lib64/libpthread.so.0(+0xf710)[0x7f4dc7c61710]
/lib64/libc.so.6(+0x79252)[0x7f4dc674a252]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7f4dc674b6b1]
/usr/lib64/libstdc++.so.6(operator new(unsigned long)+0x1d)[0x7f4dc6fbc09d]
/usr/sbin/mysqld(handle_connections_sockets()+0x39a)[0x599e1a]
/usr/sbin/mysqld(mysqld_main(int, char**)+0xd50)[0x5a19b0]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f4dc66efd5d]
/usr/sbin/mysqld[0x592f65]

Seems to be caused by an extra newline to the binlog.index file.

After removing the extra newline from the beginning of the binlog index things look much better.
show master logs; now show all binlogs.
Binlogs older than expire_logs_days are purged. Before the fix the binlog index contained binlogs from the beginning of the server, and those were also on disks.

Hello Daniël,

Thank you for the report.
Observed this with 5.6.30 build(manually inserted newline into the bin-index file to trigger the issue).

Thanks,
Umesh

test results

Attachment: 81033.results (application/octet-stream, text), 29.54 KiB.

Typo fixed.

Post from the developer:
========================

We are unable to reproduce the reported issue on the latest 5.6.31 release.

Regards,
Venkatesh.

run a generic dml workload with 'reset master' and 'show binlog events' in a few threads shows this problem easily.

--
Version: '5.7.27-log'  socket: '/tmp/mysql.sock'  port: 3306  (Built on 21 November 2019 with g++ (GCC) 10.0.0 20191106 (experimental))
2019-11-21T20:57:07.184590Z 17 [Note] Aborted connection 17 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
2019-11-21T20:57:12.701161Z 6 [ERROR] Failed to locate old binlog or relay log files
2019-11-21T20:57:12.701396Z 10 [ERROR] An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'IGNORE_ERROR'. Hence turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
2019-11-21T20:57:12.701428Z 20 [ERROR] Failed to locate old binlog or relay log files
2019-11-21T20:57:12.701523Z 31 [ERROR] Failed to locate old binlog or relay log files
2019-11-21T20:57:12.701960Z 30 [ERROR] An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'IGNORE_ERROR'. Hence turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
2019-11-21T20:58:43.733914Z 21 [Note] Aborted connection 21 to db: 'unconnected' user: 'root' host: 'localhost' (Got an error reading communication packets)
2019-11-21T20:59:27.216410Z 2795 [Note] Sort aborted, host: localhost, user: root, thread: 2795, error: Query execution was interrupted, query: delete  from  `t0078`  where  `c0008` =from_unixtime(1536999152)  order by @g   asc
--

If it's been verified by Shane, can the bug be set to "Verified"?

I believe I've hit this particular bug from multiple `reset masters` running in quick succession on a busy host (where the app running the command timed out on its side and retried, but its original thread was still live and waiting to run on the server).

I also think I've seen something similar with the relay log where `stop slave relay_thread` and `start slave relay_thread` repeated in quick succession resulted in 

[ERROR] Error looking for file after ./mysqld-relay-bin.222356

(But not a crash; and that would be a different bug.)

I cant seem to repeat this on 5.7.29 or 8.0.21 (build from trunk),  but i also cant repeat it in 5.7.26 using my own vague instructions.  however these days i'm on a different machine to 21 November 2019.