MySQL Bugs: #75781: log lock may not be unlocked if add_logged

Bug #75781	log lock may not be unlocked if add_logged_gtid failed.
Submitted:	5 Feb 2015 2:49	Modified:	26 Mar 2015 9:03
Reporter:	Fangxin Flou (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Utilities: Binlog Events	Severity:	S3 (Non-critical)
Version:	5.6.22	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	binlog, GTID

Description:
this is the current code in rpl_slave.cc

    /* write the event to the relay log */
    if (likely(rli->relay_log.append_buffer(buf, event_len, mi) == 0))
    {
      mi->set_master_log_pos(mi->get_master_log_pos() + inc_pos);
      DBUG_PRINT("info", ("master_log_pos: %lu", (ulong) mi->get_master_log_pos()));
      rli->relay_log.harvest_bytes_written(&rli->log_space_total);

      if (event_type == GTID_LOG_EVENT)
      {
        global_sid_lock->rdlock();
        int ret= rli->add_logged_gtid(gtid.sidno, gtid.gno);
        if (!ret)
          rli->set_last_retrieved_gtid(gtid);
        global_sid_lock->unlock();
        if (ret != 0)
          goto err;
      }
    }

......

err:
  if (unlock_data_lock)
    mysql_mutex_unlock(&mi->data_lock);
  DBUG_PRINT("info", ("error: %d", error));
  if (error)
    mi->report(ERROR_LEVEL, error, ER(error),
               (error == ER_SLAVE_RELAY_LOG_WRITE_FAILURE)?
               "could not queue event from master" :
               error_msg.ptr());
  DBUG_RETURN(error);

if  add_logged_gtid failed, the log_lock may not be released accord the current logic, I am still checking when add_logged_gtid will failure, I am not whether it is a really bug, or it's designed.

Thanks.

How to repeat:
N/A

Suggested fix:
diff -rc --exclude='*.orig' --exclude=sql_yacc.cc --exclude=sql_yacc.h mysql-5.6.22/sql/rpl_slave.cc mysql-5.6.22-temp/sql/rpl_slave.cc
*** mysql-5.6.22/sql/rpl_slave.cc	2014-11-21 13:39:51.000000000 +0800
--- mysql-5.6.22-temp/sql/rpl_slave.cc	2015-02-05 10:40:49.000000000 +0800
***************
*** 7038,7044 ****
--- 7038,7047 ----
            rli->set_last_retrieved_gtid(gtid);
          global_sid_lock->unlock();
          if (ret != 0)
+ 	{
+ 	  mysql_mutex_unlock(log_lock);
            goto err;
+ 	}
        }
      }
      else

It is evident from the source code analysis that log_lock is taken before the conditional block and released after it. However, there is a jump to the err tag, which does not have
any unlock operation done.

That means that under some scenarios, that mutex lock will not be unlocked, which may lead to serious server hanging and deadlocks.

Fully verified.

Thanks for your feedback. This has been fixed in an upcoming release and the following has been added to the changelog:
If an error was encountered while adding a GTID to the received GTID set, the log lock was not being correctly released. This could cause a deadlock.

commit b547bd853506fe51bb05febf16cf24b6f051864a
Author: Luis Soares <luis.soares@oracle.com>
Date:   Fri Feb 27 10:20:03 2015 +0000

    BUG#20492319: LOG LOCK MAY NOT BE UNLOCKED IF ADD_LOGGED_GTID FAILED.
    
    When a Gtid_log_event is received from the master,
    and after being queued in the relay_log, its GTID
    is saved in the received GTID set. This happens
    inside queue_event and is done while holding a lock,
    the relay_log lock. The function that is called to
    save the GTID can return an error. That is fine
    and is being tested properly. However, in the event
    of an error there will be a jump to the end of the
    queue_event function, but the relay_log mutex ends
    up NOT being released.
    
    This patch adds the missing mysql_mutex_unlock call.

Thanks to Fangxin for the bug report and the suggested fix.