Bug #110465 potential binlog data corruption when flush_thread_caches fail
Submitted: 22 Mar 2023 11:49 Modified: 27 Mar 2023 12:26
Reporter: Shawn Li Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Logging Severity:S3 (Non-critical)
Version:5.7 OS:Any
Assigned to: CPU Architecture:Any

[22 Mar 2023 11:49] Shawn Li
Description:
MYSQL_BIN_LOG::ordered_commit() flush all thread caches to binlog file by calling MYSQL_BIN_LOG::process_flush_stage_queue(). 
If one thread flush_thread_caches fail, the binlog file may have corrupted data. If another thread continue flush_thread_caches, it may success, then MYSQL_BIN_LOG::process_flush_stage_queue() return success. 
Then mysql commit success, with corrupted binlog data.

How to repeat:
debug_sync or simulate disk error..

Suggested fix:
Return error immediately after flush_thread_caches failed.

https://github.com/mysql/mysql-server/pull/448
[22 Mar 2023 14:23] MySQL Verification Team
Hi Mr. Li,

Thank you for your bug report.

However, this is a forum for bug reports with repeatable test cases.

Hence, we can not continue without a fully repeatable test case !!!!!
[22 Mar 2023 14:24] MySQL Verification Team
Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.mysql.com/how-to-report.php 

If you can provide more information, feel free to add it to this bug and change the status back to 'Open'.  

Thank you for your interest in MySQL.
[24 Mar 2023 3:23] Shawn Li
I've written a mtr case for this. 
https://github.com/lishuai87/mysql-server/pull/1/files

--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -1433,6 +1433,10 @@ class Binlog_event_writer : public Basic_ostream {
       } else {
         my_off_t write_bytes = std::min<uint64>(length, event_len);

+        DBUG_EXECUTE_IF("simulate_error_during_flush_thread_caches", {
+          if (event_len > 512) return true;
+        };);
+
         if (m_binlog_file->write(buffer, write_bytes)) return true;

         // update the checksum

--source include/have_log_bin.inc
--source include/have_binlog_format_row.inc
--source include/have_debug_sync.inc

RESET MASTER;

SET GLOBAL binlog_error_action = ABORT_SERVER;
SET GLOBAL sync_binlog = 1;
CREATE TABLE t1(i INT, c varchar(1024));
CREATE TABLE t2(i INT);
connect(con1,localhost,root,,);
connect(con2,localhost,root,,);

--connection con1
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL leader_enter_flush_stage WAIT_FOR follower_enter_flush_stage";
SET SESSION debug = "+d,simulate_error_during_flush_thread_caches";
--send INSERT INTO t1 values (1, repeat('a', 1024));

--connection con2
SET DEBUG_SYNC = "now WAIT_FOR leader_enter_flush_stage";
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL follower_enter_flush_stage";
--send INSERT INTO t2 values (2);

--connection con1
--reap
--connection con2
--reap

--connection default
select i from t1;
select i from t2;
DROP table t1, t2;
--echo #check binlog file
let $datadir = `SELECT @@DATADIR`;
--exec $MYSQL_BINLOG --force-if-open --verbose $datadir/binlog.000001 | egrep '^CREATE|^DROP|^INSERT'

# Cleanup
--disconnect con1
--disconnect con2
[24 Mar 2023 13:08] MySQL Verification Team
Hello Mr. Li,

Sorry , but we can not change our source code in order to force test cases to fail. There are already sufficient number of synchronisations available in our test suite that you can wait for some event.

Without a change in the source code, a test passes:

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] main.bug_110466                           [ fail ]
        Test ended at 2023-03-24 15:02:05

CURRENT_TEST: main.bug_110466
---  .... /mysql-8.0.32/mysql-test/r/bug_110466.result	2023-03-24 15:57:11
+++ ..../mysql-8.0.32/mysql-test/var/log/bug_110466.reject	2023-03-24 16:02:05
@@ -0,0 +1,19 @@
+RESET MASTER;
+SET GLOBAL binlog_error_action = ABORT_SERVER;
+SET GLOBAL sync_binlog = 1;
+CREATE TABLE t1(i INT, c varchar(1024));
+CREATE TABLE t2(i INT);
+SET SESSION debug = "+d,simulate_error_during_flush_thread_caches";
+INSERT INTO t1 values (1, repeat('a', 1024));;
+INSERT INTO t2 values (2);;
+select i from t1;
+i
+1
+select i from t2;
+i
+2
+DROP table t1, t2;
+#check binlog file
+CREATE TABLE t1(i INT, c varchar(1024))
+CREATE TABLE t2(i INT)
+DROP TABLE `t1`,`t2` /* generated by server */

mysqltest: Result length mismatch

The result from queries just before the failure was:
RESET MASTER;
SET GLOBAL binlog_error_action = ABORT_SERVER;
SET GLOBAL sync_binlog = 1;
CREATE TABLE t1(i INT, c varchar(1024));
CREATE TABLE t2(i INT);
SET SESSION debug = "+d,simulate_error_during_flush_thread_caches";
INSERT INTO t1 values (1, repeat('a', 1024));;
INSERT INTO t2 values (2);;
select i from t1;
i
1
select i from t2;
i
2
DROP table t1, t2;
#check binlog file
CREATE TABLE t1(i INT, c varchar(1024))
CREATE TABLE t2(i INT)
DROP TABLE `t1`,`t2` /* generated by server */
safe_process[6136]: Child process: 6137, exit: 1

 - the logfile can be found in '/Users/sinisa/razno/work/repo/mysql-8.0.32/mysql-test/var/log/main.bug_110466/bug_110466.log'

[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.000 of 26 seconds executing testcases

Completed: Failed 1/2 tests, 50.00% were successful.

Hence, please use synch commands that are already available .....
[24 Mar 2023 16:47] Shawn Li
ok, without changing mysqld source code:

--source include/have_log_bin.inc
--source include/have_binlog_format_row.inc
--source include/have_debug_sync.inc

RESET MASTER;

SET GLOBAL binlog_error_action = ABORT_SERVER;
SET GLOBAL sync_binlog = 1;
CREATE TABLE t1(i INT, c varchar(1024));
CREATE TABLE t2(i INT);
connect(con1,localhost,root,,);
connect(con2,localhost,root,,);

--connection con1
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL leader_enter_flush_stage WAIT_FOR follower_enter_flush_stage";
SET GLOBAL debug = "+d,simulate_do_write_cache_failure";
--send INSERT INTO t1 values (1, repeat('a', 1024));

--connection con2
SET DEBUG_SYNC = "now WAIT_FOR leader_enter_flush_stage";
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL follower_enter_flush_stage";
--send INSERT INTO t2 values (2);

--connection con1
--reap
--connection con2
--reap

--connection default
select i from t1;
select i from t2;
--echo #check binlog file
let $datadir = `SELECT @@DATADIR`;
--exec $MYSQL_BINLOG --force-if-open --verbose $datadir/binlog.000001 | egrep '^CREATE|^DROP|^INSERT'

Result:

RESET MASTER;
SET GLOBAL binlog_error_action = ABORT_SERVER;
SET GLOBAL sync_binlog = 1;
CREATE TABLE t1(i INT, c varchar(1024));
CREATE TABLE t2(i INT);
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL leader_enter_flush_stage WAIT_FOR follower_enter_flush_stage";
SET GLOBAL debug = "+d,simulate_do_write_cache_failure";
INSERT INTO t1 values (1, repeat('a', 1024));;
SET DEBUG_SYNC = "now WAIT_FOR leader_enter_flush_stage";
SET DEBUG_SYNC = "bgc_after_enrolling_for_flush_stage SIGNAL follower_enter_flush_stage";
INSERT INTO t2 values (2);;
select i from t1;
i
1
select i from t2;
i
2
#check binlog file
CREATE TABLE t1(i INT, c varchar(1024))
CREATE TABLE t2(i INT)

binlog file:

# at 438
#230325  0:39:36 server id 1  end_log_pos 548 CRC32 0xe89cb01a 	Query	thread_id=9	exec_time=0	error_code=0	Xid = 57
SET TIMESTAMP=1679675976/*!*/;
/*!80013 SET @@session.sql_require_primary_key=0*//*!*/;
CREATE TABLE t2(i INT)
/*!*/;
# at 548
#230325  0:39:36 server id 1  end_log_pos 627 CRC32 0x34786663 	Anonymous_GTID	last_committed=2	sequence_number=3	rbr_only=yes	original_committed_timestamp=1679675976864848	immediate_commit_timestamp=1679675976864848	transaction_length=1307
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1679675976864848 (2023-03-25 00:39:36.864848 CST)
# immediate_commit_timestamp=1679675976864848 (2023-03-25 00:39:36.864848 CST)
/*!80001 SET @@session.original_commit_timestamp=1679675976864848*//*!*/;
/*!80014 SET @@session.original_server_version=80032*//*!*/;
/*!80014 SET @@session.immediate_server_version=80032*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 627
#230325  0:39:36 server id 1  end_log_pos 706 CRC32 0xdefb31ee 	Anonymous_GTID	last_committed=2	sequence_number=4	rbr_only=yes	original_committed_timestamp=1679675976864864	immediate_commit_timestamp=1679675976864864	transaction_length=273
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1679675976864864 (2023-03-25 00:39:36.864864 CST)
# immediate_commit_timestamp=1679675976864864 (2023-03-25 00:39:36.864864 CST)
/*!80001 SET @@session.original_commit_timestamp=1679675976864864*//*!*/;
/*!80014 SET @@session.original_server_version=80032*//*!*/;
/*!80014 SET @@session.immediate_server_version=80032*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 706
#230325  0:39:36 server id 1  end_log_pos 781 CRC32 0xdaa98941 	Query	thread_id=11	exec_time=0	error_code=0
SET TIMESTAMP=1679675976/*!*/;
BEGIN
/*!*/;
# at 781
#230325  0:39:36 server id 1  end_log_pos 829 CRC32 0x4488b563 	Table_map: `test`.`t2` mapped to number 162
# has_generated_invisible_primary_key=0
# at 829
#230325  0:39:36 server id 1  end_log_pos 869 CRC32 0xff0537c6 	Write_rows: table id 162 flags: STMT_END_F

BINLOG '
SNIdZBMBAAAAMAAAAD0DAAAAAKIAAAAAAAEABHRlc3QAAnQyAAEDAAEBAQBjtYhE
SNIdZB4BAAAAKAAAAGUDAAAAAKIAAAAAAAEAAgAB/wACAAAAxjcF/w==
'/*!*/;
### INSERT INTO `test`.`t2`
### SET
###   @1=2
# at 869
#230325  0:39:36 server id 1  end_log_pos 900 CRC32 0x7d0433b9 	Xid = 63
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[24 Mar 2023 17:11] Shawn Li
--echo #check binlog file
--exec $MYSQL_BINLOG --force-if-open --verbose $datadir/binlog.000001 | egrep '^CREATE|^DROP|^INSERT|^###'

#check binlog file
CREATE TABLE t1(i INT, c varchar(1024))
CREATE TABLE t2(i INT)
### INSERT INTO `test`.`t2`
### SET
###   @1=2
[27 Mar 2023 12:26] MySQL Verification Team
Hi Mr. Li,

We have managed to repeat your new test case.

This report is now a verified bug that affects both 5.7 and 8.0. 5.6 is not maintained any more.

verified.