Bug #23831 deadlock not noticed
Submitted: 1 Nov 2006 11:25 Modified: 27 Nov 2006 19:35
Reporter: Andrei Elkin
Status: Closed
Category:Server: RBR Severity:S3 (Non-critical)
Version:5.1.13 OS:
Assigned to: Andrei Elkin Target Version:

[1 Nov 2006 11:25] Andrei Elkin
Description:
Replication thread does not log any deadlock while it happens and replicated
transaction is chosen as victim. Consequently replication thread does not rollback
which leads to cases similar to bug#16228, bug#20697

I am setting P1 because the case stands in the way of bug#20697, which test case
can not pass in rbr mode.

How to repeat:
# run into deadlock with replicate msta by master and a local by slave

connection master;
drop table if exists t1,t2;
create table t1 (a int) engine=innodb; insert into t1 values (1);
create table t2 (a int) engine=innodb; insert into t2 values (1);

connection slave;
drop table if exists tl;
create table tl (a int) engine=innodb;
begin; 
  insert into tl values (1),(2),(3);
  select * from t2 for update;

connection master;
begin;
  update t1 set a=2;
  insert into t2 set a=2;
commit;

connection slave;
  update t1 set a=0;# deadlock must happen
# but instead of deadlock warning, err log has
061101 12:11:26 [ERROR] Slave: Error in Write_rows event: row application failed,
Error_code: 121
061101 12:11:26 [ERROR] Slave: Error in Write_rows event: error during transaction
execution on table test.t2, Error_code: 121

The correct code must be the Deadlock's error which is 149.

Suggested fix:
At replace_record()

insert a check for deadlock error just after 
while ((error= table->file->ha_write_row())
[1 Nov 2006 22:31] Sveta Smirnova
Thank you for the report.

Using current 5.0 and 5.1 BK sources I get error "061101 21:49:19 [ERROR] Slave: Error
'Deadlock found when trying to get lock; try restarting transaction' on query. Default
database: 'bug23831'. Query: 'insert into t2 set a=2', Error_code: 1213"

Is behaviour correct?
[2 Nov 2006 11:42] Sveta Smirnova
Thank you for the report.

Verified as described. Only row-based replication is affected.
[2 Nov 2006 11:53] Lars Thalmann
Two things needs to be fixed:
1. Proper error message in case of deadlock,
2. Proper rollback in case of deadlock
[2 Nov 2006 16:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14754

ChangeSet@1.2324, 2006-11-02 17:51:32+02:00, aelkin@dsl-hkigw8-feaef900-46.dhcp.inet.fi +3
-0
  Bug#16228/Bug#20697 - related.
  Bug#23831  deadlock not noticed
  
  RBR bug in that when replicated msta (multi-statement-trans-action) deadlocks
  against a local at write row event, the event handler did not return the correct
  error code. 
  That led to problems of replaying replicated msta.
  
  The correct code is typed in error log and stored for error handling rotine to
  detect DL or TO and execute necessary tasks the same way as in SBR. Particularly,
  timed-out transaction still is rolled back - look at the related bugs.
[3 Nov 2006 13:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14801

ChangeSet@1.2324, 2006-11-03 14:26:40+02:00, aelkin@dsl-hkigw8-feaef900-46.dhcp.inet.fi +3
-0
  Bug#16228/Bug#20697 - related.
  Bug#23831  deadlock not noticed
  
  RBR bug in that when replicated msta (multi-statement-trans-action) deadlocks
  with a local at write row event or gets timed-out, the event handler did not return
  the correct error code.
  Wrong error code stops slave sql thread instead of to proceed with
  rollback and replay.
  
  The correct code is typed in error log and stored for error handling rotine
  to conduct rollback and replay of the transaction. The handling for the rbr
  remains the same as for the sbr events.
  Particularly, timed-out transaction still is rolled back - look at the related bugs.
[20 Nov 2006 16:37] Lars Thalmann
Pushed into 5.1.14.
[27 Nov 2006 19:35] Paul DuBois
Noted in 5.1.14 changelog.

With row-based binary logging, replicated multiple-statement
transaction deadlocks did not return the correct error code, causing
the slave SQL thread to stop rather than roll back and re-execute.