Bug #72874 optimize the mutex during the binlog file sync
Submitted: 4 Jun 2014 13:17 Modified: 10 Jun 2014 15:06
Reporter: Fangxin Flou (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Utilities: Binlog Events Severity:S4 (Feature request)
Version:5.5, 5.6 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, performance

[4 Jun 2014 13:17] Fangxin Flou
Description:
When sync the innodb log to disk, it does not hold any mutex. But for binlog, I see that the LOCK_log was hold during the sync_binlog_file, which really impact the performance under sync_binlog=1 setting, about 1/5 tps on SAS or SATA disks, we cannot afford so much performance degrading.

I think we could use a extra mutex to hand this issue, such as LOCK_flush.  When calling sync_binlog_file we could free the LOCK_log before flushing to diks, such as:

  bool need_LOCK_log= (get_sync_period() == 1);

  /*
    LOCK_log is not released when sync_binlog is 1. It guarantees that the
    events are not be replicated by dump threads before they are synced to disk.
  */
  if (change_stage(thd, Stage_manager::SYNC_STAGE, wait_queue,
                   need_LOCK_log ? NULL : &LOCK_log, &LOCK_sync))
  {
    DBUG_PRINT("return", ("Thread ID: %lu, commit_error: %d",
                          thd->thread_id, thd->commit_error));
    DBUG_RETURN(finish_commit(thd));
  }
  THD *final_queue= stage_manager.fetch_queue_for(Stage_manager::SYNC_STAGE);

  if (need_LOCK_log)
    mysql_mutex_unlock(&LOCK_log);

  mysql_mutex_lock(&LOCK_flush);

  if (flush_error == 0 && total_bytes > 0)
  {
    DEBUG_SYNC(thd, "before_sync_binlog_file");
    std::pair<bool, bool> result= sync_binlog_file(false);
    flush_error= result.first;
  }

  mysql_mutex_unlock(&LOCK_flush);

during the rotate function, just hold the LOCK_flush.

  if (force_rotate || (my_b_tell(&log_file) >= (my_off_t) max_size))
  {
    mysql_mutex_lock(&LOCK_flush);
    if ((error= new_file_without_locking(NULL)))
      /**
        Be conservative... There are possible lost events (eg,
        failing to log the Execute_load_query_log_event
        on a LOAD DATA while using a non-transactional
        table)!

        We give it a shot and try to write an incident event anyway
        to the current log.
      */
      if (!write_incident(current_thd, false/*need_lock_log=false*/,
                          false/*do_flush_and_sync==false*/))
      {
        /*
          Write an error to log. So that user might have a chance
          to be alerted and explore incident details before its
          slave servers would stop.
        */
        sql_print_error("The server was unable to create a new log file. "
                        "An incident event has been written to the binary "
                        "log which will stop the slaves.");
        flush_and_sync(0);
      }

    mysql_mutex_unlock(&LOCK_flush);
    *check_purge= true;
  }

Possible for the binlog dump command, need to hold the LOCK_flush to make sure the binlog are sync to disk before replicating to slaves.  I think it will imporve the TPS under sync_binlog=1 setting.

How to repeat:
N/A

Suggested fix:
See the description
[10 Jun 2014 15:06] MySQL Verification Team
HI!

Thank you very much for your contribution. It is actually a potential improvement in the concurrent performance in syncing binary log.