Bug #44058 Possible semi-sync replication bugs
Submitted: 2 Apr 2009 23:28 Modified: 12 Nov 2009 12:36
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Zhenxing He CPU Architecture:Any
Tags: replication, semi-sync

[2 Apr 2009 23:28] Mark Callaghan
Description:
This describes some issues you might want to avoid with semi-sync replication. I speak from experience with the Google version. I have never used/studied your version.

1) provide output in SHOW PROCESSLIST to indicate a session is blocked on semi-sync ACK

2) provide a SQL method to wake sessions waiting for semi-sync ACK (KILL?)

3) obey KILL commands. When KILL is done for a session blocked on semi-sync ACK, it should notice the KILL. I think THD::enter_cond() provides that.

4) Add an option so that semi-sync is not temporarily disabled when there is a timeout waiting for an ACK. In our version it was always disabled in that case until we knew a slave had caught up. In some cases, we don't want it ever disabled so we added a my.cnf variable.

5) Avoid this deadlock:

Thread 33 is trying to rotate to a new log and is holding mysql_bin_log::LOCK_log. It is unable to proceed because there are pending prepared transactions.

Thread 32 (at least) has a prepared transaction. It is unable to progress beyond prepared to committed because it needs an acknowledgment from a semi-sync slave.
It's not getting the acknowledgment from the slaves because the Binlog Dump threads are blocked reading events because they can't lock LOCK_log which is held by thread 33.

How to repeat:
NA
[3 Apr 2009 10:50] Susanne Ebrecht
Many thanks for writing a bug report. Your ideas are great and I will forward this into development now.

Because as result of missing this features a kind of deadlocks could happen (happens to you) I think this is more a real bug then a feature request.
[3 Apr 2009 11:17] Lars Thalmann
What we put into MySQL is based on the Google patch and should work in
the same way.  Most likely these problems are in the MySQL
semi-synchronous component too.  We need to fix it.

Thanks for reporting this Mark!
[4 Apr 2009 20:13] Sveta Smirnova
See bug #40935 also.
[10 Apr 2009 8:02] Zhenxing He
I'll try to handle 1), 3), and 5) in the bug report, 2) requires modification to the SQL parser, and I don't think it's possible not to do that in a plugin, and we can use KILL QUERY as an workaround, so I'll not handle it here. 4) is a feature request, I'd like to handle it in a separate patch.

In order to accomplish 1) and 3), the following interfaces are needed:

/**
   Current thread entering a condition

   This function should be called before putting current thread to
   wait for a condition. @a mutex should be held before calling this
   function. After being waken up, @f current_thd_exit_cond should be
   called.
*/
const char* current_thd_enter_cond(pthread_cond_t *cond,
                                   pthread_mutex_t *mutex, const char *msg);

/**
   Current thread leaving a condition

   This function should be called after being waken up for a condition.
*/
void current_thd_exit_cond(const char *msg);
[10 Apr 2009 8:49] Zhenxing He
It seems our version does not suffer the problem 5), because we wait for ACK after calling tc_log->unlog, which decreases the prepared_xids.
[14 Apr 2009 4:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71969

2841 He Zhenxing	2009-04-14
      BUG#44058 Possible semi-sync replication bugs
      
      Added the following interfaces to allow plugins to set current
      thread's status when it is about to sleep and waiting for a 
      condition. So that KILL can wake it up.
[14 Apr 2009 5:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/71970
[14 Apr 2009 6:24] Zhenxing He
There are two parts of the fix, one for the server and the other for the components. So please review both commits above.
[16 Apr 2009 10:58] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72231

2841 He Zhenxing	2009-04-16
      BUG#44058 Possible semi-sync replication bugs
      
      Added the following interfaces to allow plugins to set current
      thread's status when it is about to sleep and waiting for a 
      condition. So that KILL can wake it up.
[16 Apr 2009 10:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72233
[16 Apr 2009 10:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72235

2841 He Zhenxing	2009-04-16
      BUG#44058 Possible semi-sync replication bugs
      
      Added the following interfaces to allow plugins to set current
      thread's status when it is about to sleep and waiting for a 
      condition. So that KILL can wake it up.
[29 Apr 2009 9:23] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72989

2849 He Zhenxing	2009-04-29
      BUG#44058 Possible semi-sync replication bugs
      
      Added the following interfaces to allow plugins to set current
      thread's status when it is about to sleep and waiting for a 
      condition. So that KILL can wake it up.
[29 Apr 2009 9:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/72990
[6 May 2009 6:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73454

2850 He Zhenxing	2009-05-06
      BUG#44058 Possible semi-sync replication bugs
      
      Fix previous patch.
      
      Move thd_enter_cond/thd_exit_cond from plugin.h to replication.h
[8 May 2009 7:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/73639

2846 He Zhenxing	2009-05-08
      BUG#44058 Possible semi-sync replication bugs
      
      Fix previous patch.
      
      Move thd_enter_cond/thd_exit_cond from plugin.h to replication.h
[13 May 2009 3:31] Bugs System
Pushed into 6.0.12-alpha (revid:alik@sun.com-20090513032549-rxa73jbxd1qv09xc) (version source revid:zhenxing.he@sun.com-20090508074921-hwkkwjt4hw9qioqx) (merge vers: 6.0.12-alpha) (pib:6)
[13 May 2009 14:14] Jon Stephens
Documented bugfix in the 6.0.12 changelog as follows:

        When using semi-synchronous replication:

            ·KILL statements were not always obeyed for a session blocked
            by a semi-synchronous ACK signal.

            ·SHOW PROCESSLIST did not provide any indication that a
            session was blocked by the ACK signal.
[19 Jun 2009 7:54] Bugs System
Pushed into 5.4.4-alpha (revid:zhenxing.he@sun.com-20090619074435-4mlfkqqol4nzpq10) (version source revid:zhenxing.he@sun.com-20090619074435-4mlfkqqol4nzpq10) (merge vers: 5.4.4-alpha) (pib:11)
[26 Sep 2009 4:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/84703

3108 He Zhenxing	2009-09-26
      Backporting WL#4398 WL#1720
      Backporting BUG#44058 BUG#42244 BUG#45672 BUG#45673
      Backporting BUG#45819 BUG#45973 BUG#39012
[27 Oct 2009 9:49] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091027094604-9p7kplu1vd2cvcju) (version source revid:zhenxing.he@sun.com-20091026140226-uhnqejkyqx1aeilc) (merge vers: 6.0.14-alpha) (pib:13)
[28 Oct 2009 6:30] Jon Stephens
Already documented for 6.0.12; re-closing.
[12 Nov 2009 8:18] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091110093229-0bh5hix780cyeicl) (version source revid:alik@sun.com-20091027095744-rf45u3x3q5d1f5y0) (merge vers: 5.5.0-beta) (pib:13)
[12 Nov 2009 12:36] Jon Stephens
Also documented bugfix in the 5.5.0 changelog; closed.
[18 Dec 2009 15:43] Paul DuBois
Removed 5.5.0 changelog entry. In 5.5, semisync replication first appears in 5.5.0, so this bug affects no 5.5.x releases.