Bug #36524 incident of deadlock on slave is overdramatized
Submitted: 6 May 2008 11:46 Modified: 18 Sep 10:45
Reporter: Andrei Elkin
Status: In progress
Category:Server: Replication Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: Andrei Elkin Target Version:
Triage: Triaged: D4 (Minor)

[6 May 2008 11:46] Andrei Elkin
Description:
If slave sql thread deadlocks and is choosen as the victim there will be
unnecessary error lines in slave.err like

080506 12:27:53 [ERROR] Slave SQL: Error 'Deadlock found when trying to get lock
; try restarting transaction' on query.

As the matter of fact the slave does not stop neither display such an error
in its show slave status and eventually if slave_transaction_retries satisfied
will commit.

The error above would make sense if slave could not commit indeed.

How to repeat:
./mtr --start rpl_deadlock_innodb

execute a deadlock simulating code (consult with rpl_deadlock_innodb how to do that) so
that the victim should be the slave sql's (i.e master) transaction,
rollback the slave's side local transaction,
observer that the master transaction has committed on the slave and
observe the error line.

Suggested fix:
Produce warnings for the error file while slave still has a chance to commit the master
transaction.
[6 May 2008 20:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46423

ChangeSet@1.2612, 2008-05-06 21:52:05+03:00, aelkin@mysql1000.dsl.inet.fi +3 -0
  Bug #36524 incident of deadlock on slave is overdramatized
  
  In cases of temporary errors of a replication event execution (deadlock, timeout)
  the error log gained an overreacted error message whereas just a warning would be fine.
  
  Fixed with checking if an error is temporary at the end of execution events that prone
to
  temporary errors (Query_log_event, Rows_log_event).
[7 May 2008 14:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/46451

ChangeSet@1.2612, 2008-05-07 15:49:18+03:00, aelkin@mysql1000.dsl.inet.fi +3 -0
  Bug #36524 incident of deadlock on slave is overdramatized
  
  In cases of temporary errors in replication event execution (deadlock, timeout)
  the error log gained an overreacting error message whereas just a warning would be fine.
  
  Fixed with checking of the temporary status of the error 
  inside Slave_reporting_capability::report() to demote the error to the warning level if
needed.
  The warning gets converted into the error whenever number of successive attempts to
re-apply
  the event(s) gets equal to the global server variable slave_trans_retries.
[17 Jun 21:22] Andrei Elkin
Setting back to Patch-pending after getting back to the issue.
A separate mail explaining the last commit idea was sent out to
commits@ and the reviewers.
[29 Jun 12:47] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/77444

2801 Andrei Elkin	2009-06-29
      Bug #36524 incident of deadlock on slave is overdramatized
      
      In cases of temporary errors in replication event execution (deadlock, timeout)
      the error log gained an overreacting error message whereas just a warning would be
fine.
      
      Fixed with checking of the temporary status of the error 
      inside Slave_reporting_capability::report() to demote the error to the warning level
if
      needed.
      The warning gets converted into the error whenever number of successive attempts to
      re-apply the event(s) gets equal to the global server variable slave_trans_retries.
      
      The patch addresses some BUG#25597 issues also to indicate the demotion can not
      create any loose in logics as long as the developer sticks to stardard tools of 
      reporting and displaying errors:
      report() mimics my_error() and display() does so to send_error().
      
      Two test file results advertise a new pattern of report()/display().
[9 Jul 22:38] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/78315

2801 Andrei Elkin	2009-07-09
      Bug #36524 incident of deadlock on slave is overdramatized
            
      In cases of temporary errors in replication event execution (deadlock,
      timeout) the error log gained an overreacting error message whereas
      just a warning would be fine.
            
      Fixed with checking of the temporary status of the error nside
      Slave_reporting_capability::report() to demote the error to the
      warning level if needed.
      The warning gets converted into the error whenever number of
      successive attempts to re-apply the event(s) gets equal to the global
      server variable slave_trans_retries.
      The patch addresses some BUG#25597 issues also to indicate the
      demotion can not create any loose in logics as long as the developer
      sticks to stardard tools of reporting and displaying errors: report()
      mimics my_error() and display() does so to send_error().
      Some number of test result files got affected and in some cases the new
      results revealed issues in the old versions.