Bug #24860 | Incorrect SLAVE_TRANSACTION_RETRIES code can result in slave stuck | ||
---|---|---|---|
Submitted: | 6 Dec 2006 18:17 | Modified: | 28 Nov 2007 19:03 |
Reporter: | Rafal Somla | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.1.12 | OS: | Any |
Assigned to: | Mats Kindahl | CPU Architecture: | Any |
[6 Dec 2006 18:17]
Rafal Somla
[8 Dec 2006 10:33]
Guilhem Bichot
Rafal, "Due to incorrect implementation of SLAVE_TRANSACTION_RETRIES" Hey, it was implemented (guess by who) before row-based, no wonder it may break with row-based: row-based introduced a new type of groups (table maps + rows). You're right about groups, in fact the existing code below is an attempt to solve this problem when the group is a SBR transaction: we reset the counter only when we are not in a transaction; so if BEGIN; INSERT; # fails with transient error we will resume to BEGIN, execute BEGIN ok, but not reset the counter. Only when we reach the COMMIT will be reset the counter. Which is correct. else if (!((thd->options & OPTION_BEGIN) && opt_using_transactions)) { /* Only reset the retry counter if the event succeeded or failed with a non-transient error. On a successful event, the execution will proceed as usual; in the case of a non-transient error, the slave will stop with an error. */ rli->trans_retries= 0; // restart from fresh }
[8 Dec 2006 17:13]
Guilhem Bichot
I *think* that if we change the condition in the "else" in my post below to: "else if (rli->group_relay_log_pos == rli->event_relay_log_pos)", it would work. It would indeed do the resetting only if our just successfully processed event ended a group. If in a transaction, all events except COMMIT, don't end a group. If in autocommit mode, Intvar, Rand, Table_maps, don't end a group.
[20 Oct 2007 18:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/35989 ChangeSet@1.2579, 2007-10-20 20:16:12+02:00, mats@kindahl-laptop.dnsalias.net +4 -0 BUG#24860 (Incorrect SLAVE_TRANSACTION_RETRIES code can result in slave stuck): If a temporary error occured inside a group on an event that was not the first event of the group, the slave could get stuck because the retry counter is reset whenever an event was executed successfully. This patch only reset the retry counter when an entire group has been successfully executed, or failed with a non-transient error.
[27 Nov 2007 10:51]
Bugs System
Pushed into 5.1.23-rc
[27 Nov 2007 10:53]
Bugs System
Pushed into 6.0.4-alpha
[28 Nov 2007 19:03]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at http://dev.mysql.com/doc/en/installing-source.html Documented bugfix in 5.1.23 and 6.0.4 changelogs as: If a temporary error occured inside an event group on an event that was not the first event of the group, the slave could get caught in an endless loop because the retry counter was reset whenever an event was executed successfully.