MySQL Bugs: #16228: RBR: Slave SQL thread retries infinitely

Bug #16228	RBR: Slave SQL thread retries infinitely
Submitted:	5 Jan 2006 15:32	Modified:	21 Nov 2006 20:08
Reporter:	Mats Kindahl	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:		OS:	Any (ALL)
Assigned to:	Mats Kindahl	CPU Architecture:	Any

Description:
The slave SQL thread should stop if it cannot apply a row within the number of retries indicated by the server variable SLAVE_TRANSACTION_RETRIES.

How to repeat:
--connection master
CREATE TABLE `t1` ( `nid` int(11) NOT NULL default '0',
                `nom` char(4) default NULL,
             `prenom` char(4) default NULL,
           PRIMARY KEY USING HASH (`nid`))
   ENGINE=innodb DEFAULT CHARSET=latin1;
INSERT INTO t1 VALUES(1,"XYZ1","ABC1");

# cause a lock on that row on the slave
--sync_slave_with_master
--connection slave
BEGIN;
UPDATE t1 SET `nom`="LOCK" WHERE `nid`=1;

# set number of retries low so we fail the retries
set GLOBAL slave_transaction_retries=1;

# now do a change to this row on the master
# will deadlock on the slave because of lock above
--connection master
UPDATE t1 SET `nom`="DEAD" WHERE `nid`=1;

# wait for deadlock to be detected
# sleep longer than dead lock detection timeout in config
# we do this 2 times, once with few retries to verify that we
# get a failure with the set sleep, and once with the _same_
# sleep, but with more reties to get it to succed
--sleep 5

# replication should have stopped, since max retries where not enough
# verify with show slave status
--connection slave
--replace_result $MASTER_MYPORT MASTER_PORT
--replace_column 1 <Slave_IO_State> 7 <Read_Master_Log_Pos> 8 <Relay_Log_File> 9 <Relay_Log_Pos> 16 <Replicate_Ignore_Table> 22 <Exec_Master_Log_Pos> 23 <Relay_Log_Space> 33 <Seconds_Behind_Master>
SHOW SLAVE STATUS;

# now set max retries high enough to succeed, and start slave again
set GLOBAL slave_transaction_retries=10;
START SLAVE;

# wait for deadlock to be detected and retried
# should be the same sleep as above for test to be valid
--sleep 5

# commit transaction to release lock on row and let replication succeed
select * from t1 order by nid;
COMMIT;

# verify that the row succeded to be applied on the slave
--connection master
--sync_slave_with_master
--connection slave
select * from t1 order by nid;

# cleanup
--connection master
DROP TABLE t1;

This bug can cause the slave to retry application of a transaction an infinite number of times when it has transient errors, i.e., effectively ignoring the value of SLAVE_TRANSACTION_RETRIES. An example of a transient error is that one of the tables changed by the transaction is used by another  transaction.

This has been pushed.  Fix will be part of Release 5.1.8.

Added to changelog for 5.1.8.

Slave servers would retry the execution of a SQL statement an infinite number of times, ignoring the value <literal>SLAVE_TRANSACTION_RETRIES</literal> when using the NDB engine. (Bug #16228)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13284

ChangeSet@1.2295, 2006-10-07 00:08:52+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 
  
  Transaction on the slave sql thread got blocked against a slave's local ta lock.
  Since was default, slave-transaction-retries=10, there was replaying of replicated
  ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed
  since 5.0.13.
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem.
  Note, that this solution can be practically suboptimal only with a high rate of timeouting
  replicated transactions. 
  Upon the release of the latter

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13285

ChangeSet@1.2295, 2006-10-07 00:21:16+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeout
  
  Transaction on the slave sql thread got blocked against a slave's local ta lock.
  Since was default, slave-transaction-retries=10, there was replaying of replicated
  ta that failed because of 5.0's policy to rollback a timeouted transaction has been changed
  since 5.0.13.
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem.
  Note, that this solution can be practically suboptimal only with a high rate of timeouting
  replicated transactions. 
  Upon the release of the latter

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13303

ChangeSet@1.2295, 2006-10-07 22:02:43+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou
  
  Transaction on the slave sql thread got blocked against a slave's mysqld local ta's
  lock. Since the default, slave-transaction-retries=10, there was replaying of the 
  replicated ta. That failed because of a new 5.0.13 started policy not to rollback
  a timeouted transaction. Effectively the first round of a timed-out ta became committed
  by the replaying's first "BEGIN".
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans
  execution whenever a replicated ta deadlocks or timed-out.
  
  Note, that this solution can be practically suboptimal - in the light of the changed behavior
  due to timeout we still could replay only the last statement -  only with a high rate of timeouting
  replicated transactions.

The last cset was supposed for bug#20697, a semi-duplicate.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/13446

ChangeSet@1.2295, 2006-10-11 10:16:37+03:00, aelkin@dsl-hkigw8-feb9fb00-191.dhcp.inet.fi +3 -0
  BUG#20697 slave fails to rollback replicated transaction hang over innodb_lock_wait_timeou
  
  Transaction on the slave sql thread got blocked against a slave's mysqld local ta's
  lock. Since the default, slave-transaction-retries=10, there was replaying of the 
  replicated ta. That failed because of a new started from 5.0.13 policy not to rollback
  a timed-out transaction. Effectively the first round of a timed-out ta becomes committed
  by the replaying's first "BEGIN".
  
  It was decided to backport already existed method working in 5.1 implemented in
  bug #16228 for handling symmetrical deadlock problem. That patch introduced end_trans
  execution whenever a replicated ta deadlocks or timed-out.
  
  Note, that this solution can be practically suboptimal - in the light of the changed behavior
  due to timeout we still could replay only the last statement -  only with a high rate of timeouting
  replicated transactions.

Pushed into 5.0.32 (pushed earlier into 5.1.8.)

Noted in 5.0.32 changelog.