Bug #16228 RBR: Slave SQL thread retries infinitely
Submitted: 5 Jan 2006 15:32 Modified: 21 Nov 2006 20:08
Reporter: Mats Kindahl Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version: OS:Any (ALL)
Assigned to: Mats Kindahl

[5 Jan 2006 15:32] Mats Kindahl
Description:
The slave SQL thread should stop if it cannot apply a row within the number of retries indicated by the server variable SLAVE_TRANSACTION_RETRIES.

How to repeat:
--connection master
CREATE TABLE `t1` ( `nid` int(11) NOT NULL default '0',
                `nom` char(4) default NULL,
             `prenom` char(4) default NULL,
           PRIMARY KEY USING HASH (`nid`))
   ENGINE=innodb DEFAULT CHARSET=latin1;
INSERT INTO t1 VALUES(1,"XYZ1","ABC1");

# cause a lock on that row on the slave
--sync_slave_with_master
--connection slave
BEGIN;
UPDATE t1 SET `nom`="LOCK" WHERE `nid`=1;

# set number of retries low so we fail the retries
set GLOBAL slave_transaction_retries=1;

# now do a change to this row on the master
# will deadlock on the slave because of lock above
--connection master
UPDATE t1 SET `nom`="DEAD" WHERE `nid`=1;

# wait for deadlock to be detected
# sleep longer than dead lock detection timeout in config
# we do this 2 times, once with few retries to verify that we
# get a failure with the set sleep, and once with the _same_
# sleep, but with more reties to get it to succed
--sleep 5

# replication should have stopped, since max retries where not enough
# verify with show slave status
--connection slave
--replace_result $MASTER_MYPORT MASTER_PORT
--replace_column 1 <Slave_IO_State> 7 <Read_Master_Log_Pos> 8 <Relay_Log_File> 9 <Relay_Log_Pos> 16 <Replicate_Ignore_Table> 22 <Exec_Master_Log_Pos> 23 <Relay_Log_Space> 33 <Seconds_Behind_Master>
SHOW SLAVE STATUS;

# now set max retries high enough to succeed, and start slave again
set GLOBAL slave_transaction_retries=10;
START SLAVE;

# wait for deadlock to be detected and retried
# should be the same sleep as above for test to be valid
--sleep 5

# commit transaction to release lock on row and let replication succeed
select * from t1 order by nid;
COMMIT;

# verify that the row succeded to be applied on the slave
--connection master
--sync_slave_with_master
--connection slave
select * from t1 order by nid;

# cleanup
--connection master
DROP TABLE t1;
[8 Mar 2006 7:57] Mats Kindahl
This bug can cause the slave to retry application of a transaction an infinite number of times when it has transient errors, i.e., effectively ignoring the value of SLAVE_TRANSACTION_RETRIES. An example of a transient error is that one of the tables changed by the transaction is used by another  transaction.
[4 Apr 2006 21:56] Lars Thalmann
This has been pushed.  Fix will be part of Release 5.1.8.
[11 Apr 2006 13:34] MC Brown
Added to changelog for 5.1.8.

Slave servers would retry the execution of a SQL statement an infinite number of times, ignoring the value <literal>SLAVE_TRANSACTION_RETRIES</literal> when using the NDB engine. (Bug #16228)
[6 Oct 2006 21:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13284

ChangeSet@1.2295, 2006-10-07 00:08:52+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 
  
  Transaction on the slave sql thread got blocked against a slave's local ta lock.
  Since was default, slave-transaction-retries=10, there was replaying of replicated
  ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed
  since 5.0.13.
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem.
  Note, that this solution can be practically suboptimal only with a high rate of timeouting
  replicated transactions. 
  Upon the release of the latter
[6 Oct 2006 21:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13285

ChangeSet@1.2295, 2006-10-07 00:21:16+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeout
  
  Transaction on the slave sql thread got blocked against a slave's local ta lock.
  Since was default, slave-transaction-retries=10, there was replaying of replicated
  ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed
  since 5.0.13.
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem.
  Note, that this solution can be practically suboptimal only with a high rate of timeouting
  replicated transactions. 
  Upon the release of the latter
[7 Oct 2006 19:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13303

ChangeSet@1.2295, 2006-10-07 22:02:43+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou
  
  Transaction on the slave sql thread got blocked against a slave's mysqld local ta's
  lock. Since the default, slave-transaction-retries=10, there was replaying of the 
  replicated ta. That failed because of a new 5.0.13 started policy not to rollback
  a timeouted transaction. Effectively the first round of a timed-out ta became committed
  by the replaying's first "BEGIN".
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans
  execution whenever a replicated ta deadlocks or timed-out.
  
  Note, that this solution can be practically suboptimal - in the light of the changed behavior
  due to timeout we still could replay only the last statement -  only with a high rate of timeouting
  replicated transactions.
[10 Oct 2006 11:03] Andrei Elkin
The last cset was supposed for bug#20697, a semi-duplicate.
[11 Oct 2006 7:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13446

ChangeSet@1.2295, 2006-10-11 10:16:37+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou
  
  Transaction on the slave sql thread got blocked against a slave's mysqld local ta's
  lock. Since the default, slave-transaction-retries=10, there was replaying of the 
  replicated ta. That failed because of a new started from 5.0.13 policy not to rollback
  a timed-out transaction. Effectively the first round of a timed-out ta becomes committed
  by the replaying's first "BEGIN".
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans
  execution whenever a replicated ta deadlocks or timed-out.
  
  Note, that this solution can be practically suboptimal - in the light of the changed behavior
  due to timeout we still could replay only the last statement -  only with a high rate of timeouting
  replicated transactions.
[20 Nov 2006 15:02] Lars Thalmann
Pushed into 5.0.32 (pushed earlier into 5.1.8.)
[21 Nov 2006 20:08] Paul Dubois
Noted in 5.0.32 changelog.