Bug #44581 Slave stops when transaction with non-transactional table gets lock wait timeout
Submitted: 30 Apr 2009 16:44 Modified: 9 Jul 2009 13:13
Reporter: Victor Kirkebo Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.34 and 5.4.0 OS:Any
Assigned to: Alfranio Correia CPU Architecture:Any
Tags: lock wait timeout, replication
Triage: Triaged: D2 (Serious)

[30 Apr 2009 16:44] Victor Kirkebo
Description:
The bug was first described in http://bugs.mysql.com/bug.php?id=40116 (26 Oct 2008):

I've tested 5.1.34 and 5.4.0 with MBR and SBR (no error with RBR).

Two transactions involving a mix of transactional and non-transactional tables are executed in parallel. One of the transactions gets a "Lock wait timeout exceeded" error on the master but the slave reports no error and this difference in errors causes the slave to stop replication.

How to repeat:
This description below is taken from http://bugs.mysql.com/bug.php?id=40116 (26 Oct 2008):

Note:
The same error appears when substituting the update trigger for a delete trigger under connection #1 (and with delete statement under connection #2) below. 

Connection #1:
--------------

drop database if exists rpl_failure_test;
create database rpl_failure_test;
use rpl_failure_test;

create table t ( i int, primary key(i), f char(8) ) engine = innodb;
create table l ( d datetime, f char(32) ) engine = myisam;

delimiter |
create trigger tr after update on t for each row begin insert into l values ( no
w(), concat( 'updated t: ', old.f, ' -> ', new.f ) ); end |
delimiter ;

insert into t values (4,'black'), (2,'red'), (3,'yelow'), (1,'cyan');

commit;

set autocommit = 0;

update t set f = 'yellow' where i = 3;

Connection #2:
--------------

use rpl_failure_test;
update t set f = 'magenta' where f = 'red';

# Note: if the 2nd connection also has autocommit=0, 
# the problem happens after an explicit commit or rollback of the transaction.

#end of test case

After wait timeout is exceeded on connection #2, the following records appear in the slave.err log:

[ERROR] Slave: Query caused different errors on master and slave
. Error on master: 'Lock wait timeout exceeded; try restarting transaction' (1205), Error
on slave: 'no error' (0). Default database: 'rpl_failure_test'. Query:
 'update t set f = 'magenta' where f = 'red'', Error_code: 0
[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the
slave SQL thread with "SLAVE START". We stopped at log '
master-bin.000001' position 2531

The last record in slave.log is the update.

Specific requirements:

The first connection should use PK as a criteria in WHERE clause (although PK itself does
not have to be updated); the second connection does not use PK, however it should update a
specific row -- in my experiments only updates of rows with PK value less than the one
updated by the first connection caused slave abort -- e.g. in the test case above, if the
1st connection updates the row with PK=3, than the second connection has to update rows
'cyan' or 'magenta' to produce slave abort, while updates on 'black' or 'yellow' rows do
not cause the problem.
[12 Jun 2009 0:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76149

2936 Alfranio Correia	2009-06-12
      BUG#44581 Slave stops when transaction with non-transactional table gets lock wait timeout
      
      In STMT and MIXED modes, when both non-transactional and transactional tables
      are changed on behalf of a transaction, upon commit/rollback its statements must
      be written to the binlog. On the master, a failed statement that updates both
      non-transactional and transactional tables are annotated with the error number
      and also written to the binlog. On the slave, while applying such statements,
      it is expected the same failure.
      
      Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
      timeouts) are logged in the same way causing the slave to stop as the statements
      are applied sequentially by the SQL Thread. To fix this bug, we automatically
      ignore concurrency failures on the slave.
[12 Jun 2009 0:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/76152

2937 Alfranio Correia	2009-06-12
      BUG#44581  Slave stops when transaction with non-transactional table gets lock wait timeout
      
      Test case.
[1 Jul 2009 6:03] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77611

2988 Alfranio Correia	2009-07-01
      BUG#44581 Slave stops when transaction with non-transactional table gets lock wait timeout
      
      In STMT and MIXED modes, a statement that changes both non-transactional and
      transactional tables must be written to the binary log whenever there are
      changes to non-transactional tables. This means that the statement gets into the
      binary log even when the changes to the transactional tables fail. In particular
      , in the presence of a failure such statement is annotated with the error number
      and wrapped in a begin/rollback. On the slave, while applying the statement, it
      is expected the same failure and the rollback prevents the transactional changes
      to be persisted.
      
      Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
      timeouts) are logged in the same way causing the slave to stop as the statements
      are applied sequentially by the SQL Thread. To fix this bug, we automatically
      ignore concurrency failures on the slave. Specifically, the following failures
      are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.
     @ sql/slave.cc
        Moved the validation to the log_event.cc as any other error checking is done there.
     @ sql/slave.h
        Moved the validation to the log_event.cc as any other error checking is done there.
[3 Jul 2009 14:20] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77909

2988 Alfranio Correia	2009-07-03
      BUG#44581 Slave stops when transaction with non-transactional table gets lock wait timeout
      
      In STMT and MIXED modes, a statement that changes both non-transactional and
      transactional tables must be written to the binary log whenever there are
      changes to non-transactional tables. This means that the statement gets into the
      binary log even when the changes to the transactional tables fail. In particular
      , in the presence of a failure such statement is annotated with the error number
      and wrapped in a begin/rollback. On the slave, while applying the statement, it
      is expected the same failure and the rollback prevents the transactional changes
      to be persisted.
      
      Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
      timeouts) are logged in the same way causing the slave to stop as the statements
      are applied sequentially by the SQL Thread. To fix this bug, we automatically
      ignore concurrency failures on the slave. Specifically, the following failures
      are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.
     @ sql/slave.cc
        Moved the validation to the log_event.cc as any other error checking is done there.
     @ sql/slave.h
        Moved the validation to the log_event.cc as any other error checking is done there.
[6 Jul 2009 7:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77973

2999 Alfranio Correia	2009-07-06
      BUG#44581 Slave stops when transaction with non-transactional table gets lock wait
      timeout
            
      In STMT and MIXED modes, a statement that changes both non-transactional and
      transactional tables must be written to the binary log whenever there are
      changes to non-transactional tables. This means that the statement gets into the
      binary log even when the changes to the transactional tables fail. In particular
      , in the presence of a failure such statement is annotated with the error number
      and wrapped in a begin/rollback. On the slave, while applying the statement, it
      is expected the same failure and the rollback prevents the transactional changes
      to be persisted.
            
      Unfortunately, statements that fail due to concurrency issues (e.g. deadlocks,
      timeouts) are logged in the same way causing the slave to stop as the statements
      are applied sequentially by the SQL Thread. To fix this bug, we automatically
      ignore concurrency failures on the slave. Specifically, the following failures
      are ignored: ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK and ER_XA_RBDEADLOCK.
[6 Jul 2009 8:25] Alfranio Correia
Patch pushed to mysql-5.1-bugteam and mysql-pe.
[8 Jul 2009 13:30] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:joro@sun.com-20090706100738-7hegomqy717uocbf) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 7:36] Bugs System
Pushed into 5.1.37 (revid:joro@sun.com-20090708131116-kyz8iotbum8w9yic) (version source revid:joro@sun.com-20090706100738-7hegomqy717uocbf) (merge vers: 5.1.37) (pib:11)
[9 Jul 2009 13:13] Jon Stephens
Documented bugfix in the 5.1.37 changelog as follows:
       
       When using STATEMENT or MIXED binary logging format, a statement that
       changes both non-transactional and transactional tables must be written
       to the binary log whenever there are changes to non-transactional
       tables. This means that the statement goes into the binary log even when
       the changes to the transactional tables fail. In particular, in the
       event of a failure such statement is annotated with the error number and
       wrapped inside a pair of BEGIN and ROLLBACK statements.
       
       On the slave, while applying the statement, it is expected that the same
       failure and the rollback prevent the transactional changes from
       persisting. However, statements that fail due to concurrency issues such
       as deadlocks and timeouts are logged in the same way, causing the slave
       to stop since the statements are applied sequentially by the SQL thread.
       
       To address this issue, we ignore concurrency failures on the slave.
       Specifically, the following errors are now ignored:
       ER_LOCK_WAIT_TIMEOUT, ER_LOCK_DEADLOCK, and ER_XA_RBDEADLOCK.
[10 Jul 2009 11:20] Bugs System
Pushed into 5.4.4-alpha (revid:anozdrin@bk-internal.mysql.com-20090710111017-bnh2cau84ug1hvei) (version source revid:joro@sun.com-20090706084517-1uc44ajjxggtq8jp) (merge vers: 5.4.4-alpha) (pib:11)
[26 Aug 2009 13:46] Bugs System
Pushed into 5.1.37-ndb-7.0.8 (revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)
[26 Aug 2009 13:46] Bugs System
Pushed into 5.1.37-ndb-6.3.27 (revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (version source revid:jonas@mysql.com-20090826105955-bkj027t47gfbamnc) (merge vers: 5.1.37-ndb-6.3.27) (pib:11)
[26 Aug 2009 13:48] Bugs System
Pushed into 5.1.37-ndb-6.2.19 (revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (version source revid:jonas@mysql.com-20090825194404-37rtosk049t9koc4) (merge vers: 5.1.37-ndb-6.2.19) (pib:11)
[27 Aug 2009 16:32] Bugs System
Pushed into 5.1.35-ndb-7.1.0 (revid:magnus.blaudd@sun.com-20090827163030-6o3kk6r2oua159hr) (version source revid:jonas@mysql.com-20090826132541-yablppc59e3yb54l) (merge vers: 5.1.37-ndb-7.0.8) (pib:11)