Bug #76795 | 2pc succeeds even though binlog flush/sync fails | ||
---|---|---|---|
Submitted: | 22 Apr 2015 19:39 | Modified: | 28 May 2015 17:53 |
Reporter: | Santosh Praneeth Banda | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.6.23 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Apr 2015 19:39]
Santosh Praneeth Banda
[23 Apr 2015 5:04]
Santosh Praneeth Banda
Please use InnoDB table when reproducing the issue mysql> create table t1 (a int) engine=innodb;
[23 Apr 2015 8:28]
MySQL Verification Team
Hello Santosh, Thank you for the bug report and test case. Observed this issue with 5.6.23. Thanks, Umesh
[23 Apr 2015 8:37]
MySQL Verification Team
test results
Attachment: 76795.txt (text/plain), 8.97 KiB.
[14 May 2015 10:53]
Venkatesh Duggirala
An approach which we are following to fix this issue is : ======================================================== Using binlog_error_action setting, Server will decide the action to do if the above problematic situation happens. If binlog_error_action is ABORT_SERVER, then it will abort the server after informing the client with 'ER_BINLOGGING_IMPOSSIBLE' error. If binlog_error_action is 'IGNORE_ERROR', then it will ignore the error and disable the binlogging further until server is restarted again. The same will be mentioned in the error log file. All the transaction will be committed inside storage engine with out binlogging. Regards, Venkatesh.
[28 May 2015 11:50]
Venkatesh Duggirala
Patch that implements the above approach
Attachment: bug76795.diff (text/x-patch), 6.62 KiB.
[28 May 2015 11:51]
Venkatesh Duggirala
Post from the Developer: ======================== Please find the above patch that will fix the issue. Give it a try at your end and let us know if you see any issues with it
[28 May 2015 14:01]
David Moss
Thanks for your feedback. This has been fixed in upcoming versions and the following was noted in the 5.6.26 and 5.7.8 changelogs: If flushing the cache to the binary log failed, for example due to a disk problem, the error was not detected by the binary log group commit logic. This could cause inconsistencies between the master and the slave. The fix uses the binlog_error_action variable to decide how to handle this situation. If binlog_error_action=ABORT_SERVER, then the server aborts after informing the client with an ER_BINLOGGING_IMPOSSIBLE error. If binlog_error_action=IGNORE_ERROR, then the error is ignored and binary logging is disabled until the server is restarted again. The same is mentioned in the error log file, and the transaction is committed inside the storage engine without being added to the binary log.
[28 May 2015 17:53]
Santosh Praneeth Banda
You completely missed the scenario where binlog sync fails. My bug title and description mentions that, but my sample test case covered binlog flush case. You need to handle error from MYSQL_BIN_LOG::sync_binlog_file() in group commit. FWIW, this is the patch we are using diff --git a/sql/binlog.cc b/sql/binlog.cc index ad7ffb1..bef4a99 100644 --- a/sql/binlog.cc +++ b/sql/binlog.cc @@ -7365,6 +7365,7 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit, int flush_error= 0; my_off_t total_bytes= 0; bool do_rotate= false; + int binlog_flush_error = 0; /* These values are used while flushing a transaction, so clear @@ -7426,7 +7427,7 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit, my_off_t flush_end_pos= 0; if (flush_error == 0 && total_bytes > 0) - flush_error= flush_cache_to_file(&flush_end_pos); + binlog_flush_error = flush_error = flush_cache_to_file(&flush_end_pos); /* If the flush finished successfully, we can call the after_flush @@ -7452,7 +7453,7 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit, { DEBUG_SYNC(thd, "before_sync_binlog_file"); std::pair<bool, bool> result = sync_binlog_file(false, async); - flush_error = result.first; + binlog_flush_error = flush_error = result.first; } /* @@ -7467,11 +7468,23 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit, that haven't made it to the disk on master because of a os crash or power failure just before binlog fsync. */ - update_binlog_end_pos(); + if (!binlog_flush_error) + update_binlog_end_pos(); DBUG_EXECUTE_IF("crash_commit_after_log", DBUG_SUICIDE();); } + if (binlog_flush_error) + { + sql_print_error("Binlog flush or sync failed"); + if (binlog_error_action == ABORT_SERVER) + { + sql_print_error("Aborting the server to avoid inconsistencies between " + "binlog and innodb"); + _exit(EXIT_FAILURE); + } + } +
[29 May 2015 5:09]
Venkatesh Duggirala
Hello Santosh, Thank you for verifying the patch. Yes, you are correct , I missed case-2 (sync failure) We will work on covering the case-2 and will submit the patch soon. Sorry for the inconvenience caused. Regards, Venkatesh.
[13 Oct 2015 3:04]
Venkatesh Duggirala
(Patch that addresses both flush and sync) errors.
Attachment: patch_mysql-5.6.diff (text/x-patch), 16.74 KiB.
[13 Oct 2015 3:05]
Venkatesh Duggirala
Post from the Developer: ======================== Please find the above patch that will handle both flush and sync errors. Give it a try at your end and please let us know if you see any issues with it
[16 Oct 2015 14:31]
David Moss
Thanks for your feedback, this has been fixed and the following was noted in the 5.6.28 and 5.7.10 changelogs: Fatal errors encountered during flushing or synchronizing the binary log were being ignored. Such errors are now caught and handled depending on the setting of binlog_error_action. Additionally the documentation for binlog_error_action has also been improved: https://dev.mysql.com/doc/refman/5.7/en/replication-options-binary-log.html#sysvar_binlog_...