Bug #38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production
Submitted: 15 Aug 2008 17:13 Modified: 28 Jan 2009 12:41
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:5.0,5.1 OS:Any
Assigned to: Luis Soares CPU Architecture:Any
Tags: Logs, purge, relay, replication

[15 Aug 2008 17:13] Mark Callaghan
Description:
MYSQL_LOG::purge_logs deletes relay log files and then edits the relay log index file. If the slave dies between the two, the relay log index file references relay logs that don't exist.

This error is difficult to debug as there is nothing in the db *.err file.

Code from ::purge_logs
  /*
    If we get killed -9 here, the sysadmin would have to edit
    the log index file after restart - otherwise, this should be safe
  */
  error= update_log_index(&log_info, need_update_threads);

How to repeat:
Kill the slave when it is in purge_logs() but has yet to edit the index file.

Suggested fix:
1) write to the db *.err log indicating when purge_logs() is removing files and when it has updated the index file

or

2) update the index file first and then remove the files
[18 Aug 2008 13:43] Geert Vanderkelen
Shane suggested to put a sleep() right before the update_log_index()-call. This shows the problem nicely, also for binary logs (see below): verified.

Probably it's best to do both suggestions there: put in the error log when logs are rotated AND do the updating of the index file first, then remove.

---

Having logs:

-rw-rw----  1 gvanderkelen gvanderkelen      151 Aug 18 15:28 master-log.000001
-rw-rw----  1 gvanderkelen gvanderkelen      107 Aug 18 15:28 master-log.000002
-rw-rw----  1 gvanderkelen gvanderkelen      107 Aug 18 15:29 master-log.000003

And the binary log index:

./master-log.000001      
./master-log.000002      
./master-log.000003 

With the patched version, which sleeps for 60 seconds before updating the index file, do this:

 mysql> purge master logs to 'master-log.000003';
 (hangs for 60 seconds)

Kill the mysqld process:
 shell> kill -9 `cat /path/to/mysqld.pid`

-rw-rw----  1 gvanderkelen gvanderkelen      107 Aug 18 15:29 master-log.000003
-rw-rw----  1 gvanderkelen gvanderkelen      107 Aug 18 15:31 master-log.000004

And the content of the index file:

./master-log.000001      
./master-log.000002      
./master-log.000003      
./master-log.000004
[29 Sep 2008 20:50] Justin Tolmer
Patch with a fix for the issue

Attachment: 38826.diff (text/plain), 5.51 KiB.

[2 Nov 2008 16:28] Alfranio Tavares Correia Junior
The patch provided by google is ok. I have two comments though:

1 - The same problem may happen in MYSQL_LOG::purge_logs_before_date.

2 - If a crash happens right after updating the index file, there will be dangling logs that will never be erased. I think it is worth having a recovery
routine that handles this sort of case.
[2 Nov 2008 20:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57643

2704 Alfranio Correia	2008-11-02
      Fixed race condition while purging logs. Now, we update the index file first and then remove the files.
      BUG#38826.
[2 Nov 2008 20:19] Alfranio Tavares Correia Junior
Note the patch above is for version 5.0.X.
[2 Nov 2008 20:22] Alfranio Tavares Correia Junior
I suggest a merge between this patch and the patch provided in BUB#39325.
[2 Nov 2008 20:26] Alfranio Tavares Correia Junior
Regarding the recovery routine suggested above, maybe we should include this in the worklog proposed to improve the patch in BUG#40337.
[4 Nov 2008 0:13] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57733

2709 Luis Soares	2008-11-04
      This is a combined patch for: BUG#38826 and BUG#39325.
      It merges the previous two proposed patches into a single one.
[4 Nov 2008 0:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57734

2690 Luis Soares	2008-11-04
      This is a combined patch for: BUG#38826 and BUG#39325.
      It merges the previous two proposed patches into a single one.
      
      Merge with 5.1-rpl team tree.
[4 Nov 2008 12:21] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/57781

2690 Luis Soares	2008-11-04
      This is a combined patch for: BUG#38826 and BUG#39325.
      It merges the previous two proposed patches into a single one.
            
      Merge with 5.1-rpl team tree.
      
      The difference from previous patch is that it calls 
      "ha_binlog_index_purge_file" before purging the relay log file.
[4 Dec 2008 1:01] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60558

2731 Luis Soares	2008-12-04
      BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production
      BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton
      
      The patch reverses the order of the purging and updating events for log and relay-log.info/index files respectively.
      This solves the problem of having holes caused by crashes happening between updating info/index files and purging logs.
      
      NOTE: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on bugteam tree and takes into account reviewers suggestions.
[4 Dec 2008 14:08] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/60608

2720 Luis Soares	2008-12-04 [merge]
      BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production
      BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton
      
      The patch reverses the order of the purging and updating events for log and
      relay-log.info/index files respectively.
      
      This solves the problem of having holes caused by crashes happening between updating
      info/index files and purging logs.
      
      This patch also contains an aditional test case for testing the crashing before purge logs.
            
      NOTE1: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on
      bugteam tree and takes into account reviewers suggestions.
      
      NOTE2: Merge from 5.0-bugteam
[10 Dec 2008 10:41] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61188

2792 Luis Soares	2008-12-10 [merge]
      BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production
      BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton
            
      The patch reverses the order of the purging and updating events for log and relay-log.info/index files respectively.
            
      This solves the problem of having holes caused by crashes happening between updating info/index files and purging logs.
            
      This patch also contains an aditional test case for testing the crashing before purge logs.
                  
      NOTE1: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on bugteam tree and takes into account reviewers suggestions.
            
      NOTE2: Merge from 5.1-bugteam
[11 Dec 2008 13:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61355

2734 Luis Soares	2008-12-11 [merge]
      BUG#38826
      
      Merge from 5.0-bugteam. Additional fix for unused ret variable warning.
[11 Dec 2008 15:45] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61380

2802 Luis Soares	2008-12-11 [merge]
      BUG#38826: Race in MYSQL_LOG::purge_logs is impossible to debug in production
      
      Merge from 5.1 bugteam (fixes parentheses and unused variables warnings, removes 
      rpl_crash_before_purge_logs test - needs further discussion on how to do it
      deterministically.).
[6 Jan 2009 13:57] Bugs System
Pushed into 5.0.76 (revid:joro@sun.com-20090105160414-8q9j4bi1klkfwiup) (version source revid:azundris@mysql.com-20081230114734-nmsc37ak330zlygn) (merge vers: 5.0.76) (pib:6)
[7 Jan 2009 17:27] Jon Stephens
Documented bugfix as follows in the 5.0.76 changelog:

        When rotating relay log files, the slave deletes relay log files
        and then edits the relay log index file. Formerly, if the slave
        shut down unexpectedly between these two events, the relay log
        index file could then reference relay logs that no longer
        existed. Depending on the circumstances, this could when
        restarting the slave cause either a race condition or for
        replication to fail.

Set bug status to Patch Pending, waiting for push to 5.1 tree.

See also Bug #39325.
[15 Jan 2009 6:38] Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[19 Jan 2009 11:25] Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:03] Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 16:09] Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:56] Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[28 Jan 2009 12:41] Jon Stephens
Fix also documented in the 5.1.31 and 6.0.10 changelogs. Closed.