Bug #119947 ordered_commit some transactions had flush errors, but committed successfully.resulting in master-slave inconsistency
Submitted: 26 Feb 9:19
Reporter: yl deng Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:8.0.45 OS:Any
Assigned to: CPU Architecture:Any

[26 Feb 9:19] yl deng
Description:
in ordered_commit  , a leader thd flushes the transactions in queue.  if any transaction flush success,  process_flush_stage_queue return 0 as flushed_error 

```
MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
                                             THD **out_queue_var) {
  ...
  int flush_error = 1;

  THD *first_seen = fetch_and_process_flush_stage_queue();

  for (THD *head = first_seen; head; head = head->next_to_commit) {
    Thd_backup_and_restore switch_thd(current_thd, head);
    const auto [error, flushed_bytes] = flush_thread_caches(head);
    total_bytes += flushed_bytes;
    if (flush_error == 1) flush_error = error;
  }
...
  return flush_error;
}
```

 the ordered_commit will commit all the transaction successfully, whether the transaction flush is successful or not.  This led to some transaction missing the binlog events, but it was successfully committed in the storage engine.  It  eventually leads to master-slave inconsistency .
 

How to repeat:
Use GDB tool to set some breakpoints, waiting for two transactions to enter the flush queue concurrently, and then let the first transaction flush complete while the second transaction fails to flush. Finally, check the binlog file and table record . 

how to mock a flush error ?
My approach is to manually set thd->commit_error to 1 in binlog_cache_data::flush, thus skipping the subsequent write_transaction logic。

Using simulate_binlog_flush_error  in debug mode is also a  approach to repeat.

Suggested fix:
mysql uses binlog_error_action to control the action when any errors happen. 
if  binlog_error_action = ABORT_SERVER,  it can not ignore any errors in flush stage. Therefore, the process_flush_stage_queue function can be modified like this:

int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
   assert(total_bytes_var && out_queue_var);
   my_off_t total_bytes = 0;
   int flush_error = 1;
+  if(binlog_error_action == ABORT_SERVER)
+  {
+    flush_error = 0;
+  }
   mysql_mutex_assert_owner(&LOCK_log);
 
   THD *first_seen = fetch_and_process_flush_stage_queue();
@@ -8479,7 +8483,13 @@ int MYSQL_BIN_LOG::process_flush_stage_queue(my_off_t *total_bytes_var,
     Thd_backup_and_restore switch_thd(current_thd, head);
     const auto [error, flushed_bytes] = flush_thread_caches(head);
     total_bytes += flushed_bytes;
-    if (flush_error == 1) flush_error = error;
+    if(binlog_error_action == ABORT_SERVER)
+    {
+      if (flush_error ==0) flush_error = error;
+    } else {
+      if (flush_error ==1) flush_error = error;
+    }