| Bug #16228 | RBR: Slave SQL thread retries infinitely | ||
|---|---|---|---|
| Submitted: | 5 Jan 2006 15:32 | Modified: | 21 Nov 2006 20:08 |
| Reporter: | Mats Kindahl | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
| Version: | OS: | Any (ALL) | |
| Assigned to: | Mats Kindahl | CPU Architecture: | Any |
[8 Mar 2006 7:57]
Mats Kindahl
This bug can cause the slave to retry application of a transaction an infinite number of times when it has transient errors, i.e., effectively ignoring the value of SLAVE_TRANSACTION_RETRIES. An example of a transient error is that one of the tables changed by the transaction is used by another transaction.
[4 Apr 2006 21:56]
Lars Thalmann
This has been pushed. Fix will be part of Release 5.1.8.
[11 Apr 2006 13:34]
MC Brown
Added to changelog for 5.1.8. Slave servers would retry the execution of a SQL statement an infinite number of times, ignoring the value <literal>SLAVE_TRANSACTION_RETRIES</literal> when using the NDB engine. (Bug #16228)
[6 Oct 2006 21:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/13284 ChangeSet@1.2295, 2006-10-07 00:08:52+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0 BUG#20697 Transaction on the slave sql thread got blocked against a slave's local ta lock. Since was default, slave-transaction-retries=10, there was replaying of replicated ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed since 5.0.13. It was decided to backport already existed method working in 5.1 implemented in bug #16228 for handling symmetrical deadlock problem. Note, that this solution can be practically suboptimal only with a high rate of timeouting replicated transactions. Upon the release of the latter
[6 Oct 2006 21:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/13285 ChangeSet@1.2295, 2006-10-07 00:21:16+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0 BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeout Transaction on the slave sql thread got blocked against a slave's local ta lock. Since was default, slave-transaction-retries=10, there was replaying of replicated ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed since 5.0.13. It was decided to backport already existed method working in 5.1 implemented in bug #16228 for handling symmetrical deadlock problem. Note, that this solution can be practically suboptimal only with a high rate of timeouting replicated transactions. Upon the release of the latter
[7 Oct 2006 19:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/13303 ChangeSet@1.2295, 2006-10-07 22:02:43+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0 BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou Transaction on the slave sql thread got blocked against a slave's mysqld local ta's lock. Since the default, slave-transaction-retries=10, there was replaying of the replicated ta. That failed because of a new 5.0.13 started policy not to rollback a timeouted transaction. Effectively the first round of a timed-out ta became committed by the replaying's first "BEGIN". It was decided to backport already existed method working in 5.1 implemented in bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans execution whenever a replicated ta deadlocks or timed-out. Note, that this solution can be practically suboptimal - in the light of the changed behavior due to timeout we still could replay only the last statement - only with a high rate of timeouting replicated transactions.
[10 Oct 2006 11:03]
Andrei Elkin
The last cset was supposed for bug#20697, a semi-duplicate.
[11 Oct 2006 7:16]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/13446 ChangeSet@1.2295, 2006-10-11 10:16:37+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0 BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou Transaction on the slave sql thread got blocked against a slave's mysqld local ta's lock. Since the default, slave-transaction-retries=10, there was replaying of the replicated ta. That failed because of a new started from 5.0.13 policy not to rollback a timed-out transaction. Effectively the first round of a timed-out ta becomes committed by the replaying's first "BEGIN". It was decided to backport already existed method working in 5.1 implemented in bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans execution whenever a replicated ta deadlocks or timed-out. Note, that this solution can be practically suboptimal - in the light of the changed behavior due to timeout we still could replay only the last statement - only with a high rate of timeouting replicated transactions.
[20 Nov 2006 15:02]
Lars Thalmann
Pushed into 5.0.32 (pushed earlier into 5.1.8.)
[21 Nov 2006 20:08]
Paul DuBois
Noted in 5.0.32 changelog.

Description: The slave SQL thread should stop if it cannot apply a row within the number of retries indicated by the server variable SLAVE_TRANSACTION_RETRIES. How to repeat: --connection master CREATE TABLE `t1` ( `nid` int(11) NOT NULL default '0', `nom` char(4) default NULL, `prenom` char(4) default NULL, PRIMARY KEY USING HASH (`nid`)) ENGINE=innodb DEFAULT CHARSET=latin1; INSERT INTO t1 VALUES(1,"XYZ1","ABC1"); # cause a lock on that row on the slave --sync_slave_with_master --connection slave BEGIN; UPDATE t1 SET `nom`="LOCK" WHERE `nid`=1; # set number of retries low so we fail the retries set GLOBAL slave_transaction_retries=1; # now do a change to this row on the master # will deadlock on the slave because of lock above --connection master UPDATE t1 SET `nom`="DEAD" WHERE `nid`=1; # wait for deadlock to be detected # sleep longer than dead lock detection timeout in config # we do this 2 times, once with few retries to verify that we # get a failure with the set sleep, and once with the _same_ # sleep, but with more reties to get it to succed --sleep 5 # replication should have stopped, since max retries where not enough # verify with show slave status --connection slave --replace_result $MASTER_MYPORT MASTER_PORT --replace_column 1 <Slave_IO_State> 7 <Read_Master_Log_Pos> 8 <Relay_Log_File> 9 <Relay_Log_Pos> 16 <Replicate_Ignore_Table> 22 <Exec_Master_Log_Pos> 23 <Relay_Log_Space> 33 <Seconds_Behind_Master> SHOW SLAVE STATUS; # now set max retries high enough to succeed, and start slave again set GLOBAL slave_transaction_retries=10; START SLAVE; # wait for deadlock to be detected and retried # should be the same sleep as above for test to be valid --sleep 5 # commit transaction to release lock on row and let replication succeed select * from t1 order by nid; COMMIT; # verify that the row succeded to be applied on the slave --connection master --sync_slave_with_master --connection slave select * from t1 order by nid; # cleanup --connection master DROP TABLE t1;