Description:
in ordered_commit , a leader thd flushes the transactions in queue. if any transaction flush success, process_flush_stage_queue return 0 as flushed_error
```
MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
THD **out_queue_var) {
...
int flush_error = 1;
THD *first_seen = fetch_and_process_flush_stage_queue();
for (THD *head = first_seen; head; head = head->next_to_commit) {
Thd_backup_and_restore switch_thd(current_thd, head);
const auto [error, flushed_bytes] = flush_thread_caches(head);
total_bytes += flushed_bytes;
if (flush_error == 1) flush_error = error;
}
...
return flush_error;
}
```
the ordered_commit will commit all the transaction successfully, whether the transaction flush is successful or not. This led to some transaction missing the binlog events, but it was successfully committed in the storage engine. It eventually leads to master-slave inconsistency .
How to repeat:
Use GDB tool to set some breakpoints, waiting for two transactions to enter the flush queue concurrently, and then let the first transaction flush complete while the second transaction fails to flush. Finally, check the binlog file and table record .
how to mock a flush error ?
My approach is to manually set thd->commit_error to 1 in binlog_cache_data::flush, thus skipping the subsequent write_transaction logic。
Using simulate_binlog_flush_error in debug mode is also a approach to repeat.
Suggested fix:
mysql uses binlog_error_action to control the action when any errors happen.
if binlog_error_action = ABORT_SERVER, it can not ignore any errors in flush stage. Therefore, the process_flush_stage_queue function can be modified like this:
int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
assert(total_bytes_var && out_queue_var);
my_off_t total_bytes = 0;
int flush_error = 1;
+ if(binlog_error_action == ABORT_SERVER)
+ {
+ flush_error = 0;
+ }
mysql_mutex_assert_owner(&LOCK_log);
THD *first_seen = fetch_and_process_flush_stage_queue();
@@ -8479,7 +8483,13 @@ int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
Thd_backup_and_restore switch_thd(current_thd, head);
const auto [error, flushed_bytes] = flush_thread_caches(head);
total_bytes += flushed_bytes;
- if (flush_error == 1) flush_error = error;
+ if(binlog_error_action == ABORT_SERVER)
+ {
+ if (flush_error ==0) flush_error = error;
+ } else {
+ if (flush_error ==1) flush_error = error;
+ }
Description: in ordered_commit , a leader thd flushes the transactions in queue. if any transaction flush success, process_flush_stage_queue return 0 as flushed_error ``` MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var, THD **out_queue_var) { ... int flush_error = 1; THD *first_seen = fetch_and_process_flush_stage_queue(); for (THD *head = first_seen; head; head = head->next_to_commit) { Thd_backup_and_restore switch_thd(current_thd, head); const auto [error, flushed_bytes] = flush_thread_caches(head); total_bytes += flushed_bytes; if (flush_error == 1) flush_error = error; } ... return flush_error; } ``` the ordered_commit will commit all the transaction successfully, whether the transaction flush is successful or not. This led to some transaction missing the binlog events, but it was successfully committed in the storage engine. It eventually leads to master-slave inconsistency . How to repeat: Use GDB tool to set some breakpoints, waiting for two transactions to enter the flush queue concurrently, and then let the first transaction flush complete while the second transaction fails to flush. Finally, check the binlog file and table record . how to mock a flush error ? My approach is to manually set thd->commit_error to 1 in binlog_cache_data::flush, thus skipping the subsequent write_transaction logic。 Using simulate_binlog_flush_error in debug mode is also a approach to repeat. Suggested fix: mysql uses binlog_error_action to control the action when any errors happen. if binlog_error_action = ABORT_SERVER, it can not ignore any errors in flush stage. Therefore, the process_flush_stage_queue function can be modified like this: int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var, assert(total_bytes_var && out_queue_var); my_off_t total_bytes = 0; int flush_error = 1; + if(binlog_error_action == ABORT_SERVER) + { + flush_error = 0; + } mysql_mutex_assert_owner(&LOCK_log); THD *first_seen = fetch_and_process_flush_stage_queue(); @@ -8479,7 +8483,13 @@ int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var, Thd_backup_and_restore switch_thd(current_thd, head); const auto [error, flushed_bytes] = flush_thread_caches(head); total_bytes += flushed_bytes; - if (flush_error == 1) flush_error = error; + if(binlog_error_action == ABORT_SERVER) + { + if (flush_error ==0) flush_error = error; + } else { + if (flush_error ==1) flush_error = error; + }