Description:
When sync the innodb log to disk, it does not hold any mutex. But for binlog, I see that the LOCK_log was hold during the sync_binlog_file, which really impact the performance under sync_binlog=1 setting, about 1/5 tps on SAS or SATA disks, we cannot afford so much performance degrading.
I think we could use a extra mutex to hand this issue, such as LOCK_flush. When calling sync_binlog_file we could free the LOCK_log before flushing to diks, such as:
bool need_LOCK_log= (get_sync_period() == 1);
/*
LOCK_log is not released when sync_binlog is 1. It guarantees that the
events are not be replicated by dump threads before they are synced to disk.
*/
if (change_stage(thd, Stage_manager::SYNC_STAGE, wait_queue,
need_LOCK_log ? NULL : &LOCK_log, &LOCK_sync))
{
DBUG_PRINT("return", ("Thread ID: %lu, commit_error: %d",
thd->thread_id, thd->commit_error));
DBUG_RETURN(finish_commit(thd));
}
THD *final_queue= stage_manager.fetch_queue_for(Stage_manager::SYNC_STAGE);
if (need_LOCK_log)
mysql_mutex_unlock(&LOCK_log);
mysql_mutex_lock(&LOCK_flush);
if (flush_error == 0 && total_bytes > 0)
{
DEBUG_SYNC(thd, "before_sync_binlog_file");
std::pair<bool, bool> result= sync_binlog_file(false);
flush_error= result.first;
}
mysql_mutex_unlock(&LOCK_flush);
during the rotate function, just hold the LOCK_flush.
if (force_rotate || (my_b_tell(&log_file) >= (my_off_t) max_size))
{
mysql_mutex_lock(&LOCK_flush);
if ((error= new_file_without_locking(NULL)))
/**
Be conservative... There are possible lost events (eg,
failing to log the Execute_load_query_log_event
on a LOAD DATA while using a non-transactional
table)!
We give it a shot and try to write an incident event anyway
to the current log.
*/
if (!write_incident(current_thd, false/*need_lock_log=false*/,
false/*do_flush_and_sync==false*/))
{
/*
Write an error to log. So that user might have a chance
to be alerted and explore incident details before its
slave servers would stop.
*/
sql_print_error("The server was unable to create a new log file. "
"An incident event has been written to the binary "
"log which will stop the slaves.");
flush_and_sync(0);
}
mysql_mutex_unlock(&LOCK_flush);
*check_purge= true;
}
Possible for the binlog dump command, need to hold the LOCK_flush to make sure the binlog are sync to disk before replicating to slaves. I think it will imporve the TPS under sync_binlog=1 setting.
How to repeat:
N/A
Suggested fix:
See the description