MySQL Bugs: #23831: deadlock not noticed

Bug #23831	deadlock not noticed
Submitted:	1 Nov 2006 10:25	Modified:	27 Nov 2006 18:35
Reporter:	Andrei Elkin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S3 (Non-critical)
Version:	5.1.13	OS:
Assigned to:	Andrei Elkin	CPU Architecture:	Any

Description:
Replication thread does not log any deadlock while it happens and replicated
transaction is chosen as victim. Consequently replication thread does not rollback
which leads to cases similar to bug#16228, bug#20697

I am setting P1 because the case stands in the way of bug#20697, which test case
can not pass in rbr mode.

How to repeat:
# run into deadlock with replicate msta by master and a local by slave

connection master;
drop table if exists t1,t2;
create table t1 (a int) engine=innodb; insert into t1 values (1);
create table t2 (a int) engine=innodb; insert into t2 values (1);

connection slave;
drop table if exists tl;
create table tl (a int) engine=innodb;
begin; 
  insert into tl values (1),(2),(3);
  select * from t2 for update;

connection master;
begin;
  update t1 set a=2;
  insert into t2 set a=2;
commit;

connection slave;
  update t1 set a=0;# deadlock must happen
# but instead of deadlock warning, err log has
061101 12:11:26 [ERROR] Slave: Error in Write_rows event: row application failed, Error_code: 121
061101 12:11:26 [ERROR] Slave: Error in Write_rows event: error during transaction execution on table test.t2, Error_code: 121

The correct code must be the Deadlock's error which is 149.

Suggested fix:
At replace_record()

insert a check for deadlock error just after 
while ((error= table->file->ha_write_row())

Thank you for the report.

Using current 5.0 and 5.1 BK sources I get error "061101 21:49:19 [ERROR] Slave: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'bug23831'. Query: 'insert into t2 set a=2', Error_code: 1213"

Is behaviour correct?

Thank you for the report.

Verified as described. Only row-based replication is affected.

Two things needs to be fixed:
1. Proper error message in case of deadlock,
2. Proper rollback in case of deadlock

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14754

ChangeSet@1.2324, 2006-11-02 17:51:32+02:00, aelkin@dsl-hkigw8-feaef900-46.dhcp.inet.fi +3 -0
  Bug#16228/Bug#20697 - related.
  Bug#23831  deadlock not noticed
  
  RBR bug in that when replicated msta (multi-statement-trans-action) deadlocks
  against a local at write row event, the event handler did not return the correct
  error code. 
  That led to problems of replaying replicated msta.
  
  The correct code is typed in error log and stored for error handling rotine to
  detect DL or TO and execute necessary tasks the same way as in SBR. Particularly,
  timed-out transaction still is rolled back - look at the related bugs.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/14801

ChangeSet@1.2324, 2006-11-03 14:26:40+02:00, aelkin@dsl-hkigw8-feaef900-46.dhcp.inet.fi +3 -0
  Bug#16228/Bug#20697 - related.
  Bug#23831  deadlock not noticed
  
  RBR bug in that when replicated msta (multi-statement-trans-action) deadlocks
  with a local at write row event or gets timed-out, the event handler did not return
  the correct error code.
  Wrong error code stops slave sql thread instead of to proceed with
  rollback and replay.
  
  The correct code is typed in error log and stored for error handling rotine
  to conduct rollback and replay of the transaction. The handling for the rbr
  remains the same as for the sbr events.
  Particularly, timed-out transaction still is rolled back - look at the related bugs.

Pushed into 5.1.14.

Noted in 5.1.14 changelog.

With row-based binary logging, replicated multiple-statement
transaction deadlocks did not return the correct error code, causing
the slave SQL thread to stop rather than roll back and re-execute.