Bug #119395 WL #12527 may cause performance jitter under heavy write workloads
Submitted: 14 Nov 9:20
Reporter: George Ma (OCA) Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:8.4 OS:Any
Assigned to: CPU Architecture:Any

[14 Nov 9:20] George Ma
Description:
WL #12527 modified the redo log format, changing it from two ib_logfile files to 32 ib_redo files. By design, these 32 ib_redo files are also reused, but the sequence number continues to advance. In theory, as long as checkpoint_lsn moves forward, old ib_redo files can be marked as unused and then renamed to ib_redo_tmp. This way, the next time `start_next_file` is called, they can be directly reused. Therefore, maintaining a certain number of ib_redo_tmp files is necessary.

However, in the current code logic, when `is_consumption_needed` determines whether old ib_redo files need to be recycled, it checks the following conditions:

```c++
bool is_consumption_needed(const log_t &log) {
  DBUG_EXECUTE_IF("log_force_consumption", return true;);
  const auto current_size = physical_size(log, log.m_capacity.next_file_size());
  const auto target_capacity = log.m_capacity.target_physical_capacity();
  const auto current_capacity = log.m_capacity.current_physical_capacity();

  ut_a(current_size <= current_capacity);

  return /* case 1. */ log.m_requested_files_consumption ||
         /* case 2. */ log.m_unused_files_count == 0 ||
         /* case 3. */ target_capacity < current_capacity ||
         /* case 4. */ current_size < current_capacity;
} 
```

This results in the following behavior: when the parameter innodb_redo_log_capacity has not been modified, old ib_redo files will be recycled only if `m_unused_files_count = 0`. If, at this point, the write workload is heavy and the last available ib_redo file is quickly filled up, the error ER_IB_MSG_LOG_WRITER_WAIT_ON_NEW_LOG_FILE will occur, which triggers `log_files_wait_for_next_file_available` to wake up log_files_governor. At this moment, since the last ib_redo file has already been completely written, further redo log writes will be blocked, which leads to performance jitter.

How to repeat:
No need.