Bug #36524 incident of deadlock on slave is overdramatized
Submitted: 6 May 2008 9:46 Modified: 30 Mar 2011 7:42
Reporter: Andrei Elkin Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: Andrei Elkin CPU Architecture:Any

[6 May 2008 9:46] Andrei Elkin
Description:
If slave sql thread deadlocks and is choosen as the victim there will be
unnecessary error lines in slave.err like

080506 12:27:53 [ERROR] Slave SQL: Error 'Deadlock found when trying to get lock
; try restarting transaction' on query.

As the matter of fact the slave does not stop neither display such an error
in its show slave status and eventually if slave_transaction_retries satisfied
will commit.

The error above would make sense if slave could not commit indeed.

How to repeat:
./mtr --start rpl_deadlock_innodb

execute a deadlock simulating code (consult with rpl_deadlock_innodb how to do that) so that the victim should be the slave sql's (i.e master) transaction,
rollback the slave's side local transaction,
observer that the master transaction has committed on the slave and
observe the error line.

Suggested fix:
Produce warnings for the error file while slave still has a chance to commit the master transaction.
[6 May 2008 18:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46423

ChangeSet@1.2612, 2008-05-06 21:52:05+03:00, aelkin@mysql1000.dsl.inet.fi +3 -0
  Bug #36524 incident of deadlock on slave is overdramatized
  
  In cases of temporary errors of a replication event execution (deadlock, timeout)
  the error log gained an overreacted error message whereas just a warning would be fine.
  
  Fixed with checking if an error is temporary at the end of execution events that prone to
  temporary errors (Query_log_event, Rows_log_event).
[7 May 2008 12:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46451

ChangeSet@1.2612, 2008-05-07 15:49:18+03:00, aelkin@mysql1000.dsl.inet.fi +3 -0
  Bug #36524 incident of deadlock on slave is overdramatized
  
  In cases of temporary errors in replication event execution (deadlock, timeout)
  the error log gained an overreacting error message whereas just a warning would be fine.
  
  Fixed with checking of the temporary status of the error 
  inside Slave_reporting_capability::report() to demote the error to the warning level if needed.
  The warning gets converted into the error whenever number of successive attempts to re-apply
  the event(s) gets equal to the global server variable slave_trans_retries.
[17 Jun 2009 19:22] Andrei Elkin
Setting back to Patch-pending after getting back to the issue.
A separate mail explaining the last commit idea was sent out to
commits@ and the reviewers.
[29 Jun 2009 10:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77444

2801 Andrei Elkin	2009-06-29
      Bug #36524 incident of deadlock on slave is overdramatized
      
      In cases of temporary errors in replication event execution (deadlock, timeout)
      the error log gained an overreacting error message whereas just a warning would be fine.
      
      Fixed with checking of the temporary status of the error 
      inside Slave_reporting_capability::report() to demote the error to the warning level if
      needed.
      The warning gets converted into the error whenever number of successive attempts to
      re-apply the event(s) gets equal to the global server variable slave_trans_retries.
      
      The patch addresses some BUG#25597 issues also to indicate the demotion can not
      create any loose in logics as long as the developer sticks to stardard tools of 
      reporting and displaying errors:
      report() mimics my_error() and display() does so to send_error().
      
      Two test file results advertise a new pattern of report()/display().
[9 Jul 2009 20:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78315

2801 Andrei Elkin	2009-07-09
      Bug #36524 incident of deadlock on slave is overdramatized
            
      In cases of temporary errors in replication event execution (deadlock,
      timeout) the error log gained an overreacting error message whereas
      just a warning would be fine.
            
      Fixed with checking of the temporary status of the error nside
      Slave_reporting_capability::report() to demote the error to the
      warning level if needed.
      The warning gets converted into the error whenever number of
      successive attempts to re-apply the event(s) gets equal to the global
      server variable slave_trans_retries.
      The patch addresses some BUG#25597 issues also to indicate the
      demotion can not create any loose in logics as long as the developer
      sticks to stardard tools of reporting and displaying errors: report()
      mimics my_error() and display() does so to send_error().
      Some number of test result files got affected and in some cases the new
      results revealed issues in the old versions.
[28 Feb 2010 19:33] Andrei Elkin
Mats, hello.

We were discussing fixes last time in July. I guess that was
a bit of miscommunication between us wrt to
the last commit http://lists.mysql.com/commits/78315.

Your last mail reply was entirely positive, you may check with

  From: Mats Kindahl <mats@sun.com>
  Subject: Re: bzr commit into mysql-5.4-bugfixing branch (aelkin:2801) Bug#36524
  To: Andrei Elkin <Andrei.Elkin@Sun.COM>
  Cc: Andrei Elkin <aelkin@mysql.com>, commits@lists.mysql.com
  Date: Fri, 10 Jul 2009 08:08:08 +0200

In particular the commit fixed passing error code into has_temporary_error()
that you rightfully flagged up:

>>> +  if (level == ERROR_LEVEL && has_temporary_error(thd))
> 
>> has_temporary_error() refers to thd->stmt_da->sql_errno(), which may not be the
>> same as err_code above. I thought we agreed that you should either pass the
>> error code to has_temporary_error(), or read it from thd->stmt_da to ensure
So I am asking to review ...

All in all, the last fixes are supposed to be final, and I am asking to
settle the matter.

cheers,

Andrei
[30 Mar 2011 7:42] Jon Stephens
Documented fix as follows in the 5.6.3 changelog:

      Deadlocks in the slave SQL thread could cause unnecessary -Deadlock
      found when trying to get lock; try restarting transaction- errors to
      be logged on the slave.
      
      Now in such cases, only a warning is logged unless
      slave_transaction_retries has been exceeded.

Closed.