MySQL Bugs: #61545: commit stalls when mysql_binlog

Bug #61545	commit stalls when mysql_binlog_send does disk read
Submitted:	16 Jun 2011 20:13	Modified:	15 Jan 2014 8:30
Reporter:	Mark Callaghan	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S5 (Performance)
Version:	5.1.52	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	Contribution, LOCK_log, replication, stall

Description:
We get stalls and convoys of many threads waiting to commit with the binlog enabled when another thread holds LOCK_log while doing a disk read of a binlog file. This can happen when a replication slave asks for binlog data that is not in the OS buffer cache. The thread stack for the lock holder is:
my_read,_my_b_read,Log_event::read_log_event,mysql_binlog_send,dispatch_command,do_command,handle_one_connection,start_thread,clone

How to repeat:
Read read_log_event to see that it locks LOCK_log before doing a read. And then run poor-mans-profiler to dump thread stacks during stalls to see many threads waiting on LOCK_log during commit when one thread is in read_log_event.

int Log_event::read_log_event(IO_CACHE* file, String* packet,
                              pthread_mutex_t* log_lock)
{
  ulong data_len;
  int result=0;
  char buf[LOG_EVENT_MINIMAL_HEADER_LEN];
  DBUG_ENTER("Log_event::read_log_event");

  if (log_lock)
    pthread_mutex_lock(log_lock);
  if (my_b_read(file, (uchar*) buf, sizeof(buf)))
  {
    /*

And then run

And then run PMP to get thread stacks when there is a pileup. But you need a workload that reads binlog files that are not in the OS buffer cache.

I am evaluating a patch that does not lock LOCK_log for inactive log files:

diff --git a/sql/sql_repl.cc b/sql/sql_repl.cc
index fa27757..1cc4bf3 100644
--- a/sql/sql_repl.cc
+++ b/sql/sql_repl.cc
@@ -412,6 +412,10 @@ void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos,
   int left_events = max_binlog_dump_events;
 #endif
   int old_max_allowed_packet= thd->variables.max_allowed_packet;
+  /* Optimization for bugs.mysql.com 61545 to avoid holding LOCK_log while
+     reading old binlogs */
+  bool is_active = TRUE;
+
   DBUG_ENTER("mysql_binlog_send");
   DBUG_PRINT("enter",("log_ident: '%s'  pos: %ld", log_ident, (long) pos));
 
@@ -471,6 +475,8 @@ impossible position";
     goto err;
   }
 
+  is_active = mysql_bin_log.is_active(log_file_name);
+
   /*
     We need to start a packet with something other than 255
     to distinguish it from error
@@ -535,7 +541,8 @@ impossible position";
        Try to find a Format_description_log_event at the beginning of
        the binlog
      */
-     if (!(error = Log_event::read_log_event(&log, packet, log_lock)))
+     if (!(error = Log_event::read_log_event(&log, packet,
+                                             is_active ? log_lock : NULL)))
      {
        /*
          The packet has offsets equal to the normal offsets in a binlog
@@ -601,7 +608,8 @@ impossible position";
 
   while (!net->error && net->vio != 0 && !thd->killed)
   {
-    while (!(error = Log_event::read_log_event(&log, packet, log_lock)))
+    while (!(error = Log_event::read_log_event(&log, packet,
+                                               is_active ? log_lock: NULL)))
     {
 #ifndef DBUG_OFF
       if (max_binlog_dump_events && !left_events--)
@@ -613,6 +621,15 @@ impossible position";
       }
 #endif
 
+      /* If log_file_name is the current binlog, is_active must be set */
+      DBUG_ASSERT(!(mysql_bin_log.is_active(log_file_name) && !is_active));
+      if (mysql_bin_log.is_active(log_file_name) && !is_active)
+      {
+        errmsg = "bad state in mysql_binlog_send";
+        my_errno= ER_UNKNOWN_ERROR;
+        goto err;
+      }
+
       if ((*packet)[EVENT_TYPE_OFFSET+1] == FORMAT_DESCRIPTION_EVENT)
       {
         binlog_can_be_corrupted= test((*packet)[FLAGS_OFFSET+1] &
@@ -845,8 +862,8 @@ impossible position";
                                  &linfo);
        goto err;
       }
-
       DBUG_PRINT("info", ("Binlog filename: new binlog %s", log_file_name));
+      is_active = mysql_bin_log.is_active(log_file_name);
 
       packet->length(0);
       packet->append('\0');

Thank you for the report.

Verified as described.

This was fixed as part of the refactoring of the replication dump thread that was implemented in 5.7.2.