Bug #68251 3-way deadlock on semisync replication and binlog rotate
Submitted: 2 Feb 2013 1:09 Modified: 28 May 2013 6:58
Reporter: Yoshinori Matsunobu (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.6.9 OS:Any
Assigned to: CPU Architecture:Any

[2 Feb 2013 1:09] Yoshinori Matsunobu
Description:
The below deadlock scenario may happen.

1) a client doing commit:
 holds LOCK_commit mutex
 waits position update from semisync slave (waiting for thread 3)

2) a client rotating binlog:
 holds LOCK_log mutex
 waits LOCK_commit mutex (waiting for thread 1)

3) binlog_dump thread (mysql_binlog_send):
 waits LOCK_log (waiting for thread 2)
 can not send anything to slave

No update can be done until rpl_semi_sync_master_timeout passes.

How to repeat:
Set longer rpl_semi_sync_master_timeout and run concurrent write benchmarks and wait until binlog rotates.

Here are stack traces.

a client blocked at commit:
#0  0x000000374460a687 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f212c4b763f in inline_mysql_cond_timedwait (src_line=<optimized out>, src_file=<optimized out>, abstime=<optimized out>, mutex=0x7f212c6bbd90, that=0x7f212c6bbd58) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/include/mysql/psi/mysql_thread.h:1199
#2  ReplSemiSyncMaster::cond_timewait (this=0x7f212c6bbd40, wait_time=0x587bb6e0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/plugin/semisync/semisync_master.cc:468
#3  0x00007f212c4b87dd in ReplSemiSyncMaster::commitTrx (this=0x7f212c6bbd40, trx_wait_binlog_name=0x132e591 "binary-slave-5609.001750", trx_wait_binlog_pos=209764144) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/plugin/semisync/semisync_master.cc:713
#4  0x0000000000666852 in Trans_delegate::after_commit (this=0x13106e0, thd=<optimized out>, all=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/rpl_handler.cc:233
#5  0x000000000058ab7a in ha_commit_low (thd=0x7f2128897120, all=false) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/handler.cc:1500
#6  0x000000000089d4de in MYSQL_BIN_LOG::process_commit_stage_queue (this=0x132e540, thd=0x7f21283d2ad0, first=0x7f2128897120, flush_error=0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:6034
#7  0x00000000008a7ba5 in MYSQL_BIN_LOG::ordered_commit (this=0x132e540, thd=0x7f21283d2ad0, all=<optimized out>, skip_commit=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:6360
#8  0x00000000008a7f5f in MYSQL_BIN_LOG::commit (this=0x132e540, thd=0x7f21283d2ad0, all=false) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:5863
#9  0x000000000058b572 in ha_commit_trans (thd=0x7f21283d2ad0, all=2) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/handler.cc:1420
...

rotating binlog thread:
#0  0x000000374460c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003744608911 in _L_mutex_lock_1133 () from /lib64/libpthread.so.0
#2  0x000000374460886a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000008a515f in inline_mysql_mutex_lock (src_line=<optimized out>, src_file=<optimized out>, that=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/include/mysql/psi/mysql_thread.h:688
#4  MYSQL_BIN_LOG::new_file_impl (this=0x132e540, need_lock_log=false, extra_description_event=0x0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:4305
#5  0x00000000008a6770 in MYSQL_BIN_LOG::rotate (this=0x132e540, force_rotate=<optimized out>, check_purge=0x1) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:4841
#6  0x00000000008a7add in MYSQL_BIN_LOG::ordered_commit (this=0x132e540, thd=0x7f21288a93f0, all=<optimized out>, skip_commit=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:6397
#7  0x00000000008a7f5f in MYSQL_BIN_LOG::commit (this=0x132e540, thd=0x7f21288a93f0, all=false) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/binlog.cc:5863
#8  0x000000000058b572 in ha_commit_trans (thd=0x7f21288a93f0, all=2) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/handler.cc:1420
#9  0x000000000077aa69 in trans_commit_stmt (thd=0x132ea00) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/transaction.cc:373
....

binlog sender thread:
#0  0x000000374460c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000037446088a5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x0000003744608333 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000884998 in inline_mysql_mutex_lock (src_line=<optimized out>, src_file=<optimized out>, that=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/include/mysql/psi/mysql_thread.h:688
#4  Log_event::read_log_event (file=0x58574470, packet=0x7f21283f47a0, log_lock=0x374460f838, checksum_alg_arg=1 '\001') at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/log_event.cc:1187
#5  0x00000000008b20a5 in mysql_binlog_send (thd=0x7f21283f4210, log_ident=0x7f21102bd290 "binary-slave-5609.001750", pos=25572522, slave_gtid_executed=0x0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/rpl_master.cc:1081
#6  0x00000000008b3a0f in com_binlog_dump (thd=0x7f21283f4210, packet=0x7f21280ed8b1 "\230\274\200\f", packet_length=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/sql/rpl_master.cc:695
[2 Feb 2013 1:51] MySQL Verification Team
Thanks for the bug report, I repeated it easily, see attachment file for some more information.
[2 Feb 2013 1:59] MySQL Verification Team
some info

Attachment: bug68251_some_info.txt (text/plain), 20.19 KiB.

[2 Feb 2013 2:34] MySQL Verification Team
Affects the current bzr 5.6.11 plugin/server too:

| NULL | Binlog Dump |  699 | Finished reading one binlog; switching to next binlog | NULL                                             |
| test | Sleep       |  335 |                                                       | NULL                                             |
| NULL | Query       |    0 | init                                                  | show processlist                                 |
| test | Sleep       |    4 |                                                       | NULL                                             |
| test | Query       |  244 | Waiting for semi-sync ACK from slave                  | update t1 set b='31453' where a>'-26543' limit 5 |
| test | Query       |  244 | query end                                             | delete from t1 where b='8797'                    |
+------+-------------+------+-------------------------------------------------------+--------------------------------------------------+
[7 Mar 2013 8:00] MySQL Verification Team
there is nothing specific about it, however i found implicit rollbacks by thd destructor triggers it faster

Attachment: bug16271657.c (text/plain), 8.67 KiB.

[28 May 2013 6:58] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[28 May 2013 6:59] Jon Stephens
Fixed in 5.6+, together with BUG#68569, see which for docs info.