Bug #77738 flush_cache_to_file calls _exit when ABORT_SERVER
Submitted: 15 Jul 2015 19:36 Modified: 13 Oct 2015 12:29
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.7.9 OS:Any
Assigned to: CPU Architecture:Any

[15 Jul 2015 19:36] Shane Bester
Description:
When binlog_error_action == ABORT_SERVER then server writes to error log:

[ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server

and calls _exit(MYSQLD_FAILURE_EXIT);

Calling _exit means no core-dump is possible. Therefore nobody knows what was going on.   
In my case, this was the stack of the thread which I caught in debugger:

00 ntdll!NtWaitForSingleObject+0xa
01 ntdll!RtlpWaitOnCriticalSection+0xe8
02 ntdll!RtlEnterCriticalSection+0xd1
03 ntdll!RtlLockHeap+0x3b
04 ntdll!RtlExitUserProcess+0x3f
05 mysqld!__crtExitProcess+0x11 [.\startup\crt0dat.c @ 774]
06 mysqld!doexit+0x17c [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 678]
07 mysqld!MYSQL_BIN_LOG::flush_cache_to_file+0xe9 [.\sql\binlog.cc @ 8329]
08 mysqld!MYSQL_BIN_LOG::ordered_commit+0x17c [.\sql\binlog.cc @ 8617]
09 mysqld!MYSQL_BIN_LOG::write_binlog_and_commit_engine+0x4b5 [\binlog.cc @ 7981]
0a mysqld!MYSQL_BIN_LOG::commit+0x12 [.\sql\binlog.cc @ 7767]
0b mysqld!close_temporary_tables+0x62d [.\sql\sql_base.cc @ 1946]
0c mysqld!THD::cleanup+0x126 [.\sql\sql_class.cc @ 1811]
0d mysqld!THD::release_resources+0x299 [.\sql\sql_class.cc @ 1877]
0e mysqld!handle_connection+0x15b [.\connection_handler_per_thread.cc @ 312]
0f mysqld!pfs_spawn_thread+0x123 [.\storage\perfschema\pfs.cc @ 2181]
10 mysqld!win_thread_start+0x1b [.\mysys\my_thread.c @ 38]
11 mysqld!_callthreadstartex+0x17 [.\startup\threadex.c @ 376]
12 mysqld!_threadstartex+0x102 [.\startup\threadex.c @ 354]
13 kernel32!BaseThreadInitThunk+0xd
14 ntdll!RtlUserThreadStart+0x1d

Also I suspect _exit() is NOT threadsafe and should be avoided.

How to repeat:
This is not about how to repeat my issue.
It is about how to find out what queries were running at the time of an exit?

Suggested fix:
Is it possible to call abort() so that corefile can be made?
Or some useful information be printed in error log?

Minor issue,  the current error message has a typo, s/occured/occurred/
[15 Jul 2015 19:36] MySQL Verification Team
feel free to discuss or make FR public, or justify why _exit() is better than abort(), etc.
[22 Jul 2015 13:21] MySQL Verification Team
Hello Shane,

Thank you for the feature request.

// With debug build (used test case from binlog_error_action)

CREATE TABLE t1(i INT);
SET SESSION debug="+d,simulate_error_during_flush_cache_to_file";
SET GLOBAL binlog_error_action= ABORT_SERVER;
INSERT INTO t1 VALUES (1);

mysql> use test
Database changed
mysql> SET SESSION debug="+d,simulate_error_during_flush_cache_to_file";
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE t1(i INT);
ERROR 1050 (42S01): Unknown error 1050
mysql> CREATE TABLE t1(i INT);
ERROR 1050 (42S01): Unknown error 1050
mysql> \q
Bye
[4]-  Exit 1                  bin/mysqld-debug --no-defaults --basedir=/export/umesh/server/binaries/mysql-advanced-5.7.9 --datadir=/ramdisk/mysql-advanced-5.7.9/77738 --core-file --binlog_error_action=ABORT_SERVER --log-bin=mysql-bin --server-id=1 --socket=/tmp/mysql_ushastry.sock --port=15000 --log-error=/export/umesh/server/binaries/mysql-advanced-5.7.9/77738.err 2>&1
[umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.9:

// Extract from error log

Version: '5.7.9-enterprise-commercial-advanced-debug-log'  socket: '/tmp/mysql_ushastry.sock'  port: 15000  MySQL Enterprise Server - Advanced Edition Debug (Commercial)
2015-07-22T13:14:30.634584Z 2 [ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server

Also, may be typo in "occured"?

Thanks,
Umesh
[26 Aug 2015 20:14] Mark Callaghan
This is a bug, not a feature request. I just hit this while running linkbench. I thought I killed the linkbench client, maybe I didn't, then I did "drop database linkbench; create database linkbench". 

So the first bug is calling _exit without trying to give me a core file so I can debug the problem.

The second bug is that this error shouldn't occur (mysqld unplanned restart from drop database). But I can't share much more info because there is no call stack in the db error log. All I have is:

2015-08-26T20:05:06.413319Z 1570 [Note] Aborted connection 1570 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.413395Z 1562 [Note] Aborted connection 1562 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.413893Z 1582 [Note] Aborted connection 1582 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.413424Z 1572 [Note] Aborted connection 1572 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.413452Z 1569 [Note] Aborted connection 1569 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.414092Z 1579 [Note] Aborted connection 1579 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.414095Z 1592 [Note] Aborted connection 1592 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:05:06.413478Z 1561 [Note] Aborted connection 1561 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets)
2015-08-26T20:06:15.521718Z 2097 [ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server
150826 13:06:21 mysqld_safe Number of processes running now: 0
150826 13:06:21 mysqld_safe mysqld restarted
[26 Aug 2015 20:58] Mark Callaghan
Problem repeated. I had two test servers, one using 5.7.8 with original style InnoDB compression, the other using hole-punch compression. Both get unplanned restarts from DROP TABLE with this error message.

Then I tried a shorter test case, and that failed.

Why can't replication code, my_b_flush_io_cache and its callers, log something to indicate what failed? Many of us have wasted days/weeks/months debugging replication failures in the past for this reason.
[27 Aug 2015 8:17] Sven Sandberg
Posted by developer:
 
This is likely a bug in 5.6 as well.
However, impact should be bigger in 5.7 since binlog_error_action defaults to IGNORE_ERROR in 5.6 and ABORT_SERVER in 5.7.
[25 Sep 2015 7:24] MySQL Verification Team
maybe a note that would help making a repeatable testcase.  seems SET PASSWORD was a culprit in this test:

-------- mysqld!__crtExitProcess+0x11 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 774]
mysqld!doexit+0x17c [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 678]
mysqld!MYSQL_BIN_LOG::flush_cache_to_file+0xec [.trunk\sql\binlog.cc @ 8460]
mysqld!MYSQL_BIN_LOG::ordered_commit+0x17c [.trunk\sql\binlog.cc @ 8748]
mysqld!MYSQL_BIN_LOG::commit+0x4d4 [.trunk\sql\binlog.cc @ 8113]
mysqld!ha_commit_trans+0x212 [.trunk\sql\handler.cc @ 1649]
mysqld!trans_commit_stmt+0x3a [.trunk\sql\transaction.cc @ 461]
mysqld!acl_end_trans_and_close_tables+0x30 [.trunk\sql\auth\sql_user_table.cc @ 285]
mysqld!change_password+0x5ae [.trunk\sql\auth\sql_user.cc @ 785]
mysqld!set_var_password::update+0x1e [.trunk\sql\set_var.cc @ 913]
mysqld!sql_set_variables+0x7c [.trunk\sql\set_var.cc @ 662]
mysqld!mysql_execute_command+0x1908 [.trunk\sql\sql_parse.cc @ 3454]
mysqld!mysql_parse+0x258 [.trunk\sql\sql_parse.cc @ 5311]
mysqld!dispatch_command+0x625 [.trunk\sql\sql_parse.cc @ 1254]
mysqld!do_command+0x1ea [.trunk\sql\sql_parse.cc @ 821]
mysqld!handle_connection+0x128 [.trunk\sql\conn_handler\connection_handler_per_thread.cc @ 295]
mysqld!pfs_spawn_thread+0x123 [.trunk\storage\perfschema\pfs.cc @ 2213]
mysqld!win_thread_start+0x1b [.trunk\mysys\my_thread.c @ 42]
mysqld!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
mysqld!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
[25 Sep 2015 17:30] Mark Callaghan
We are approaching 3 months since this was first reported. Independent of whether we provide a test case the code has to be fixed. Replication is broken if it calls ABORT without leaving behind any hints as to what went wrong.
[29 Sep 2015 9:34] MySQL Verification Team
Also seeing this code path exit on 5.8.0:

ntdll.dll!RtlExitUserProcess()
mysqld.exe!doexit(int code, int quick, int retcaller)
mysqld.exe!MYSQL_BIN_LOG::new_file_impl(bool need_lock_log, Format_description_log_event * extra_description_event)
mysqld.exe!MYSQL_BIN_LOG::new_file_without_locking(Format_description_log_event * extra_description_event)
mysqld.exe!MYSQL_BIN_LOG::rotate(bool force_rotate, bool * check_purge)
mysqld.exe!MYSQL_BIN_LOG::rotate_and_purge(THD * thd, bool force_rotate)
mysqld.exe!reload_acl_and_cache(THD * thd, unsigned long options, TABLE_LIST * tables, int * write_to_binlog)
mysqld.exe!mysql_execute_command(THD * thd, bool first_level)
mysqld.exe!mysql_parse(THD * thd, Parser_state * parser_state)
mysqld.exe!dispatch_command(THD * thd, const COM_DATA * com_data, enum_server_command command)
mysqld.exe!do_command(THD * thd)
mysqld.exe!handle_connection(void * arg)
mysqld.exe!pfs_spawn_thread(void * arg)
mysqld.exe!win_thread_start(void * p)
[29 Sep 2015 9:49] MySQL Verification Team
A crude testcase for 5.8: Start server with --server-id=1 --log-bin --binlog-error-action=ABORT_SERVER --binlog-format=mixed

yes "set @a:=(select id from information_schema.processlist where id<>connection_id() order by rand() limit 1); kill query @a;" | mysql -uroot --force > /dev/null &
yes "reset master;" | mysql -uroot --force > /dev/null &
yes "flush logs;" | mysql -uroot --force > /dev/null &
[13 Oct 2015 12:29] David Moss
Thanks for your feedback. This is fixed in upcoming versions and the following was added to the 5.6.28 and 5.7.9 release notes:
As binlog_error_action=ABORT_SERVER is the default in MySQL 5.7.7 it is being used for more error situations. The behavior has been adjusted to generate a core dump to improve troubleshooting possibilities.