Bug #77738 | flush_cache_to_file calls _exit when ABORT_SERVER | ||
---|---|---|---|
Submitted: | 15 Jul 2015 19:36 | Modified: | 13 Oct 2015 12:29 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.7.9 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[15 Jul 2015 19:36]
Shane Bester
[15 Jul 2015 19:36]
MySQL Verification Team
feel free to discuss or make FR public, or justify why _exit() is better than abort(), etc.
[22 Jul 2015 13:21]
MySQL Verification Team
Hello Shane, Thank you for the feature request. // With debug build (used test case from binlog_error_action) CREATE TABLE t1(i INT); SET SESSION debug="+d,simulate_error_during_flush_cache_to_file"; SET GLOBAL binlog_error_action= ABORT_SERVER; INSERT INTO t1 VALUES (1); mysql> use test Database changed mysql> SET SESSION debug="+d,simulate_error_during_flush_cache_to_file"; Query OK, 0 rows affected (0.00 sec) mysql> CREATE TABLE t1(i INT); ERROR 1050 (42S01): Unknown error 1050 mysql> CREATE TABLE t1(i INT); ERROR 1050 (42S01): Unknown error 1050 mysql> \q Bye [4]- Exit 1 bin/mysqld-debug --no-defaults --basedir=/export/umesh/server/binaries/mysql-advanced-5.7.9 --datadir=/ramdisk/mysql-advanced-5.7.9/77738 --core-file --binlog_error_action=ABORT_SERVER --log-bin=mysql-bin --server-id=1 --socket=/tmp/mysql_ushastry.sock --port=15000 --log-error=/export/umesh/server/binaries/mysql-advanced-5.7.9/77738.err 2>&1 [umshastr@hod03]/export/umesh/server/binaries/mysql-advanced-5.7.9: // Extract from error log Version: '5.7.9-enterprise-commercial-advanced-debug-log' socket: '/tmp/mysql_ushastry.sock' port: 15000 MySQL Enterprise Server - Advanced Edition Debug (Commercial) 2015-07-22T13:14:30.634584Z 2 [ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server Also, may be typo in "occured"? Thanks, Umesh
[26 Aug 2015 20:14]
Mark Callaghan
This is a bug, not a feature request. I just hit this while running linkbench. I thought I killed the linkbench client, maybe I didn't, then I did "drop database linkbench; create database linkbench". So the first bug is calling _exit without trying to give me a core file so I can debug the problem. The second bug is that this error shouldn't occur (mysqld unplanned restart from drop database). But I can't share much more info because there is no call stack in the db error log. All I have is: 2015-08-26T20:05:06.413319Z 1570 [Note] Aborted connection 1570 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.413395Z 1562 [Note] Aborted connection 1562 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.413893Z 1582 [Note] Aborted connection 1582 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.413424Z 1572 [Note] Aborted connection 1572 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.413452Z 1569 [Note] Aborted connection 1569 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.414092Z 1579 [Note] Aborted connection 1579 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.414095Z 1592 [Note] Aborted connection 1592 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:05:06.413478Z 1561 [Note] Aborted connection 1561 to db: 'linkdb' user: 'root' host: '127.0.0.1' (Got an error reading communication packets) 2015-08-26T20:06:15.521718Z 2097 [ERROR] An error occured during flushing cache to file. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server 150826 13:06:21 mysqld_safe Number of processes running now: 0 150826 13:06:21 mysqld_safe mysqld restarted
[26 Aug 2015 20:58]
Mark Callaghan
Problem repeated. I had two test servers, one using 5.7.8 with original style InnoDB compression, the other using hole-punch compression. Both get unplanned restarts from DROP TABLE with this error message. Then I tried a shorter test case, and that failed. Why can't replication code, my_b_flush_io_cache and its callers, log something to indicate what failed? Many of us have wasted days/weeks/months debugging replication failures in the past for this reason.
[27 Aug 2015 8:17]
Sven Sandberg
Posted by developer: This is likely a bug in 5.6 as well. However, impact should be bigger in 5.7 since binlog_error_action defaults to IGNORE_ERROR in 5.6 and ABORT_SERVER in 5.7.
[25 Sep 2015 7:24]
MySQL Verification Team
maybe a note that would help making a repeatable testcase. seems SET PASSWORD was a culprit in this test: -------- mysqld!__crtExitProcess+0x11 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 774] mysqld!doexit+0x17c [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 678] mysqld!MYSQL_BIN_LOG::flush_cache_to_file+0xec [.trunk\sql\binlog.cc @ 8460] mysqld!MYSQL_BIN_LOG::ordered_commit+0x17c [.trunk\sql\binlog.cc @ 8748] mysqld!MYSQL_BIN_LOG::commit+0x4d4 [.trunk\sql\binlog.cc @ 8113] mysqld!ha_commit_trans+0x212 [.trunk\sql\handler.cc @ 1649] mysqld!trans_commit_stmt+0x3a [.trunk\sql\transaction.cc @ 461] mysqld!acl_end_trans_and_close_tables+0x30 [.trunk\sql\auth\sql_user_table.cc @ 285] mysqld!change_password+0x5ae [.trunk\sql\auth\sql_user.cc @ 785] mysqld!set_var_password::update+0x1e [.trunk\sql\set_var.cc @ 913] mysqld!sql_set_variables+0x7c [.trunk\sql\set_var.cc @ 662] mysqld!mysql_execute_command+0x1908 [.trunk\sql\sql_parse.cc @ 3454] mysqld!mysql_parse+0x258 [.trunk\sql\sql_parse.cc @ 5311] mysqld!dispatch_command+0x625 [.trunk\sql\sql_parse.cc @ 1254] mysqld!do_command+0x1ea [.trunk\sql\sql_parse.cc @ 821] mysqld!handle_connection+0x128 [.trunk\sql\conn_handler\connection_handler_per_thread.cc @ 295] mysqld!pfs_spawn_thread+0x123 [.trunk\storage\perfschema\pfs.cc @ 2213] mysqld!win_thread_start+0x1b [.trunk\mysys\my_thread.c @ 42] mysqld!_callthreadstartex+0x17 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376] mysqld!_threadstartex+0x102 [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
[25 Sep 2015 17:30]
Mark Callaghan
We are approaching 3 months since this was first reported. Independent of whether we provide a test case the code has to be fixed. Replication is broken if it calls ABORT without leaving behind any hints as to what went wrong.
[29 Sep 2015 9:34]
MySQL Verification Team
Also seeing this code path exit on 5.8.0: ntdll.dll!RtlExitUserProcess() mysqld.exe!doexit(int code, int quick, int retcaller) mysqld.exe!MYSQL_BIN_LOG::new_file_impl(bool need_lock_log, Format_description_log_event * extra_description_event) mysqld.exe!MYSQL_BIN_LOG::new_file_without_locking(Format_description_log_event * extra_description_event) mysqld.exe!MYSQL_BIN_LOG::rotate(bool force_rotate, bool * check_purge) mysqld.exe!MYSQL_BIN_LOG::rotate_and_purge(THD * thd, bool force_rotate) mysqld.exe!reload_acl_and_cache(THD * thd, unsigned long options, TABLE_LIST * tables, int * write_to_binlog) mysqld.exe!mysql_execute_command(THD * thd, bool first_level) mysqld.exe!mysql_parse(THD * thd, Parser_state * parser_state) mysqld.exe!dispatch_command(THD * thd, const COM_DATA * com_data, enum_server_command command) mysqld.exe!do_command(THD * thd) mysqld.exe!handle_connection(void * arg) mysqld.exe!pfs_spawn_thread(void * arg) mysqld.exe!win_thread_start(void * p)
[29 Sep 2015 9:49]
MySQL Verification Team
A crude testcase for 5.8: Start server with --server-id=1 --log-bin --binlog-error-action=ABORT_SERVER --binlog-format=mixed yes "set @a:=(select id from information_schema.processlist where id<>connection_id() order by rand() limit 1); kill query @a;" | mysql -uroot --force > /dev/null & yes "reset master;" | mysql -uroot --force > /dev/null & yes "flush logs;" | mysql -uroot --force > /dev/null &
[13 Oct 2015 12:29]
David Moss
Thanks for your feedback. This is fixed in upcoming versions and the following was added to the 5.6.28 and 5.7.9 release notes: As binlog_error_action=ABORT_SERVER is the default in MySQL 5.7.7 it is being used for more error situations. The behavior has been adjusted to generate a core dump to improve troubleshooting possibilities.