MySQL Bugs: #68250: Current semi-synchronous replication may cause deadlock

Bug #68250	Current semi-synchronous replication may cause deadlock
Submitted:	2 Feb 2013 0:35	Modified:	23 Apr 2013 13:10
Reporter:	Yoshinori Matsunobu (OCA)	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.6.9	OS:	Any
Assigned to:	Luis Soares	CPU Architecture:	Any

Description:
Currently semi-synchronous replication works as below.

BINLOG DUMP:
mysql_binlog_send()
 while loop
  parse binlog event
  my_net_write()
  after_send_event()
   gets read_pos from slave
   sends cond broadcast
  net_flush()
  read_log_event()
   if eof:
    wait_for_update_bin_log()

my_net_write() does not guarantee that a packet is delivered to slave, so after_send_event() (repl_semi_after_send_event() called from semisync plugin) may not read the latest read_pos from slave. If the binlog dump thread enters wait_for_update_bin_log() and has not received any events after that, semisync plugin itself has no opportunity to get the latest binlog position information, so semisync writer thread has to wait until semisync timeout.

This problem is more likely to happen in 5.6 since group commit is supported. All clients have to wait until semisync timeout passes.

How to repeat:
I could repeat this problem by running concurrent inserts from 100 threads with semisync plugin and sync-binlog=1.

One writer waited at ReplSemiSyncMaster::commitTrx() until rpl_semi_sync_master_timeout had passed. 

pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,ReplSemiSyncMaster::cond_timewait:wait_time=0x589c36e0),ReplSemiSyncMaster::commitTrx,Trans_delegate::after_commit,ha_commit_low,MYSQL_BIN_LOG::process_commit_stage_queue,MYSQL_BIN_LOG::ordered_commit,MYSQL_BIN_LOG::commit,ha_commit_trans,trans_commit_stmt,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone

All other clients were waiting for group commit to complete.

binlog sender waited at wait_for_update_bin_log().

pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,MYSQL_BIN_LOG::wait_for_update_bin_log:out>,,mysql_binlog_send,com_binlog_dump,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread

I verified with debugger that reply_file_pos_ (semisync plugin's internal variable) was older than slave's read pos.

When manually updating master (to generate new binlog event), semisync replication restarted. This is because after_send_event() was called and reply_file_pos_ was updated.

Suggested fix:
I think after_send_event() hook should be called after net_flush() as well. When I tested calling the hook after net_flush(), this problem went away and performance actually improved a bit.

We are able to reproduce it deterministically! Working on a patch.

The described deadlock scenario is strange, due to we already flushed the
net before get read_pos from slave in after_send_event(...). See below.
Invoke link: Binlog_transmit_delegate::after_send_event(...) ->
repl_semi_after_send_event(...) -> ReplSemiSyncMaster::readSlaveReply(...)

int ReplSemiSyncMaster::readSlaveReply(NET *net, ...)
{
  ......
  /* We flush to make sure that the current event is sent to the network,
   * instead of being buffered in the TCP/IP stack.
   */
  if (net_flush(net))
  {
    sql_print_error("Semi-sync master failed on net_flush() "
                    "before waiting for slave reply");
    goto l_end;
  }
  ......
  log_file_pos = uint8korr(packet + REPLY_BINLOG_POS_OFFSET);
  ......
  result = reportReplyBinlog(server_id, log_file_name, log_file_pos);
  ......
}

The above test (running concurrent inserts from 100 threads by using
mysqlslap) should cause deadlock scenario described in Bug#68251
instead of this one.

I am also curious why the reporter's patch can gain performance.
Could you please post your configuration, such as the time of semi-sync timeout and so on?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

net_flush() Will not cause the deadlock scenario described in the
description.