Bug #38826 | Race in MYSQL_LOG::purge_logs is impossible to debug in production | ||
---|---|---|---|
Submitted: | 15 Aug 2008 17:13 | Modified: | 28 Jan 2009 12:41 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S3 (Non-critical) |
Version: | 5.0,5.1 | OS: | Any |
Assigned to: | Luis Soares | CPU Architecture: | Any |
Tags: | Logs, purge, relay, replication |
[15 Aug 2008 17:13]
Mark Callaghan
[18 Aug 2008 13:43]
Geert Vanderkelen
Shane suggested to put a sleep() right before the update_log_index()-call. This shows the problem nicely, also for binary logs (see below): verified. Probably it's best to do both suggestions there: put in the error log when logs are rotated AND do the updating of the index file first, then remove. --- Having logs: -rw-rw---- 1 gvanderkelen gvanderkelen 151 Aug 18 15:28 master-log.000001 -rw-rw---- 1 gvanderkelen gvanderkelen 107 Aug 18 15:28 master-log.000002 -rw-rw---- 1 gvanderkelen gvanderkelen 107 Aug 18 15:29 master-log.000003 And the binary log index: ./master-log.000001 ./master-log.000002 ./master-log.000003 With the patched version, which sleeps for 60 seconds before updating the index file, do this: mysql> purge master logs to 'master-log.000003'; (hangs for 60 seconds) Kill the mysqld process: shell> kill -9 `cat /path/to/mysqld.pid` -rw-rw---- 1 gvanderkelen gvanderkelen 107 Aug 18 15:29 master-log.000003 -rw-rw---- 1 gvanderkelen gvanderkelen 107 Aug 18 15:31 master-log.000004 And the content of the index file: ./master-log.000001 ./master-log.000002 ./master-log.000003 ./master-log.000004
[29 Sep 2008 20:50]
Justin Tolmer
Patch with a fix for the issue
Attachment: 38826.diff (text/plain), 5.51 KiB.
[2 Nov 2008 16:28]
Alfranio Tavares Correia Junior
The patch provided by google is ok. I have two comments though: 1 - The same problem may happen in MYSQL_LOG::purge_logs_before_date. 2 - If a crash happens right after updating the index file, there will be dangling logs that will never be erased. I think it is worth having a recovery routine that handles this sort of case.
[2 Nov 2008 20:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57643 2704 Alfranio Correia 2008-11-02 Fixed race condition while purging logs. Now, we update the index file first and then remove the files. BUG#38826.
[2 Nov 2008 20:19]
Alfranio Tavares Correia Junior
Note the patch above is for version 5.0.X.
[2 Nov 2008 20:22]
Alfranio Tavares Correia Junior
I suggest a merge between this patch and the patch provided in BUB#39325.
[2 Nov 2008 20:26]
Alfranio Tavares Correia Junior
Regarding the recovery routine suggested above, maybe we should include this in the worklog proposed to improve the patch in BUG#40337.
[4 Nov 2008 0:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57733 2709 Luis Soares 2008-11-04 This is a combined patch for: BUG#38826 and BUG#39325. It merges the previous two proposed patches into a single one.
[4 Nov 2008 0:17]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57734 2690 Luis Soares 2008-11-04 This is a combined patch for: BUG#38826 and BUG#39325. It merges the previous two proposed patches into a single one. Merge with 5.1-rpl team tree.
[4 Nov 2008 12:21]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/57781 2690 Luis Soares 2008-11-04 This is a combined patch for: BUG#38826 and BUG#39325. It merges the previous two proposed patches into a single one. Merge with 5.1-rpl team tree. The difference from previous patch is that it calls "ha_binlog_index_purge_file" before purging the relay log file.
[4 Dec 2008 1:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60558 2731 Luis Soares 2008-12-04 BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton The patch reverses the order of the purging and updating events for log and relay-log.info/index files respectively. This solves the problem of having holes caused by crashes happening between updating info/index files and purging logs. NOTE: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on bugteam tree and takes into account reviewers suggestions.
[4 Dec 2008 14:08]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/60608 2720 Luis Soares 2008-12-04 [merge] BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton The patch reverses the order of the purging and updating events for log and relay-log.info/index files respectively. This solves the problem of having holes caused by crashes happening between updating info/index files and purging logs. This patch also contains an aditional test case for testing the crashing before purge logs. NOTE1: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on bugteam tree and takes into account reviewers suggestions. NOTE2: Merge from 5.0-bugteam
[10 Dec 2008 10:41]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61188 2792 Luis Soares 2008-12-10 [merge] BUG#38826 Race in MYSQL_LOG::purge_logs is impossible to debug in production BUG#39325 Server crash inside MYSQL_LOG::purge_first_log halts replicaiton The patch reverses the order of the purging and updating events for log and relay-log.info/index files respectively. This solves the problem of having holes caused by crashes happening between updating info/index files and purging logs. This patch also contains an aditional test case for testing the crashing before purge logs. NOTE1: This is a combined patch for BUG#38826 and BUG#39325. This patch is based on bugteam tree and takes into account reviewers suggestions. NOTE2: Merge from 5.1-bugteam
[11 Dec 2008 13:30]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61355 2734 Luis Soares 2008-12-11 [merge] BUG#38826 Merge from 5.0-bugteam. Additional fix for unused ret variable warning.
[11 Dec 2008 15:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/61380 2802 Luis Soares 2008-12-11 [merge] BUG#38826: Race in MYSQL_LOG::purge_logs is impossible to debug in production Merge from 5.1 bugteam (fixes parentheses and unused variables warnings, removes rpl_crash_before_purge_logs test - needs further discussion on how to do it deterministically.).
[6 Jan 2009 13:57]
Bugs System
Pushed into 5.0.76 (revid:joro@sun.com-20090105160414-8q9j4bi1klkfwiup) (version source revid:azundris@mysql.com-20081230114734-nmsc37ak330zlygn) (merge vers: 5.0.76) (pib:6)
[7 Jan 2009 17:27]
Jon Stephens
Documented bugfix as follows in the 5.0.76 changelog: When rotating relay log files, the slave deletes relay log files and then edits the relay log index file. Formerly, if the slave shut down unexpectedly between these two events, the relay log index file could then reference relay logs that no longer existed. Depending on the circumstances, this could when restarting the slave cause either a race condition or for replication to fail. Set bug status to Patch Pending, waiting for push to 5.1 tree. See also Bug #39325.
[15 Jan 2009 6:38]
Bugs System
Pushed into 5.1.31 (revid:joro@sun.com-20090115053147-tx1oapthnzgvs1ro) (version source revid:azundris@mysql.com-20081230114838-cn52tu180wcrvh0h) (merge vers: 5.1.31) (pib:6)
[19 Jan 2009 11:25]
Bugs System
Pushed into 5.1.31-ndb-6.2.17 (revid:tomas.ulin@sun.com-20090119095303-uwwvxiibtr38djii) (version source revid:tomas.ulin@sun.com-20090115073240-1wanl85vlvw2she1) (merge vers: 5.1.31-ndb-6.2.17) (pib:6)
[19 Jan 2009 13:03]
Bugs System
Pushed into 5.1.31-ndb-6.3.21 (revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (version source revid:tomas.ulin@sun.com-20090119104956-guxz190n2kh31fxl) (merge vers: 5.1.31-ndb-6.3.21) (pib:6)
[19 Jan 2009 16:09]
Bugs System
Pushed into 5.1.31-ndb-6.4.1 (revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (version source revid:tomas.ulin@sun.com-20090119144033-4aylstx5czzz88i5) (merge vers: 5.1.31-ndb-6.4.1) (pib:6)
[20 Jan 2009 18:56]
Bugs System
Pushed into 6.0.10-alpha (revid:joro@sun.com-20090119171328-2hemf2ndc1dxl0et) (version source revid:azundris@mysql.com-20081230114916-c290n83z25wkt6e4) (merge vers: 6.0.9-alpha) (pib:6)
[28 Jan 2009 12:41]
Jon Stephens
Fix also documented in the 5.1.31 and 6.0.10 changelogs. Closed.