Bug #28238 | expire_logs_days and PURGE MASTER LOGS fail when index not up to date | ||
---|---|---|---|
Submitted: | 4 May 2007 13:20 | Modified: | 5 Aug 2010 14:05 |
Reporter: | Baron Schwartz (Basic Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S4 (Feature request) |
Version: | 5.0.38 | OS: | Linux (Gentoo) |
Assigned to: | Mats Kindahl | CPU Architecture: | Any |
Tags: | binlog, expire_logs_days, master, purge, qc, replication |
[4 May 2007 13:20]
Baron Schwartz
[4 May 2007 13:30]
Baron Schwartz
Whoops, I messed up the report a little bit. After running FLUSH LOGS on the master (usa), I ran PURGE MASTER LOGS before going back and looking at the disk again. Just in case that's not clear. In fact, I only copied a small fraction of the different kinds of commands I really tried running, but none of them purged any logs. I just didn't want the bug report to fill up with too much text.
[4 May 2007 13:52]
Mark Leith
Hi Baron, Initially, I can't repeat this, even with a slave connected (on 5.0.40): /* Master data directory */ medusa:/usr/local/mysql root# ls -l data total 58296 -rw-rw---- 1 mysql wheel 5242880 May 4 14:38 ib_logfile0 -rw-rw---- 1 mysql wheel 5242880 May 2 22:27 ib_logfile1 -rw-rw---- 1 mysql wheel 18874368 May 4 14:29 ibdata1 -rw-rw---- 1 mysql wheel 102541 May 4 14:44 medusa-bin.000001 -rw-rw---- 1 mysql wheel 102517 May 4 14:44 medusa-bin.000002 -rw-rw---- 1 mysql wheel 102517 May 4 14:44 medusa-bin.000003 -rw-rw---- 1 mysql wheel 102517 May 4 14:44 medusa-bin.000004 -rw-rw---- 1 mysql wheel 51338 May 4 14:44 medusa-bin.000005 -rw-rw---- 1 mysql wheel 200 May 4 14:44 medusa-bin.index -rw-rw---- 1 mysql wheel 6 May 4 14:38 medusa.pid drwxr-x--- 53 mysql wheel 1802 May 2 22:26 mysql drwxr-x--- 12 mysql wheel 408 May 3 16:42 test /* Slave data directory */ medusa:/usr/local/mysql root# ls -l data2 total 57480 -rw-r----- 1 mysql wheel 5242880 May 4 14:39 ib_logfile0 -rw-r----- 1 mysql wheel 5242880 May 4 14:31 ib_logfile1 -rw-r----- 1 mysql wheel 18874368 May 4 14:38 ibdata1 -rw-rw---- 1 mysql wheel 68 May 4 14:45 master.info -rw-rw---- 1 mysql wheel 51476 May 4 14:45 medusa-relay-bin.000010 -rw-rw---- 1 mysql wheel 47 May 4 14:44 medusa-relay-bin.index -rw-rw---- 1 mysql wheel 6 May 4 14:39 medusa.pid drwxr-x--- 53 mysql wheel 1802 May 4 14:31 mysql -rw-rw---- 1 mysql wheel 79 May 4 14:45 relay-log.info drwxr-x--- 12 mysql wheel 408 May 4 14:31 test medusa:/usr/local/mysql root# more master-my.cnf [mysqld] basedir = /usr/local/mysql/ datadir = /usr/local/mysql/data server-id = 1 log-bin max_binlog_size = 100K expire_logs_days = 2 medusa:/usr/local/mysql root# more slave-my.cnf [mysqld] port = 3307 socket = /tmp/mysql-slave.sock basedir = /usr/local/mysql datadir = /usr/local/mysql/data2 server-id = 2 /* Switch dates */ medusa:/usr/local/mysql root# date Fri May 4 14:45:55 BST 2007 medusa:/usr/local/mysql root# date Tue May 8 14:46:17 BST 2007 /* Generate more binlogs */ medusa:/usr/local/mysql root# while [ 1 ] ; do mysql -u root test -e 'insert into binlog_test (i,j) values (1,1)' ; done ^C /* Check data directories */ medusa:/usr/local/mysql root# ls -l data total 57888 -rw-rw---- 1 mysql wheel 5242880 May 4 14:38 ib_logfile0 -rw-rw---- 1 mysql wheel 5242880 May 2 22:27 ib_logfile1 -rw-rw---- 1 mysql wheel 18874368 May 4 14:29 ibdata1 -rw-rw---- 1 mysql wheel 102517 May 8 14:46 medusa-bin.000005 -rw-rw---- 1 mysql wheel 102517 May 8 14:46 medusa-bin.000006 -rw-rw---- 1 mysql wheel 55538 May 8 14:47 medusa-bin.000007 -rw-rw---- 1 mysql wheel 120 May 8 14:46 medusa-bin.index -rw-rw---- 1 mysql wheel 6 May 4 14:38 medusa.pid drwxr-x--- 53 mysql wheel 1802 May 2 22:26 mysql drwxr-x--- 12 mysql wheel 408 May 3 16:42 test medusa:/usr/local/mysql root# ls -l data2 total 57488 -rw-r----- 1 mysql wheel 5242880 May 4 14:39 ib_logfile0 -rw-r----- 1 mysql wheel 5242880 May 4 14:31 ib_logfile1 -rw-r----- 1 mysql wheel 18874368 May 4 14:38 ibdata1 -rw-rw---- 1 mysql wheel 68 May 8 14:47 master.info -rw-rw---- 1 mysql wheel 55676 May 8 14:47 medusa-relay-bin.000014 -rw-rw---- 1 mysql wheel 47 May 8 14:46 medusa-relay-bin.index -rw-rw---- 1 mysql wheel 6 May 4 14:39 medusa.pid drwxr-x--- 53 mysql wheel 1802 May 4 14:31 mysql -rw-rw---- 1 mysql wheel 79 May 8 14:47 relay-log.info drwxr-x--- 12 mysql wheel 408 May 4 14:31 test So it seems that the master's binary logs are *still* getting purged, even with a slave connected. Could you provide the my.cnf files for the servers in question please? Cheers, Mark
[4 May 2007 14:16]
Baron Schwartz
I just added fresno's and usa's my.cnf files to the bug (I also added nepal's and called it fresno's, sorry... I'm not doing so well this morning). I can't see significant differences between them. We don't have expire_logs_days set anymore in these cnf files. The testing I did with that a week or so ago didn't work, so I disabled it. Today's testing is only on the effect of PURGE MASTER LOGS. Thanks for being patient with me as I proceed to make a completely unintelligible bug report even worse :-/
[14 May 2007 13:45]
Baron Schwartz
I think I have found the bug. Whether this is a bug is now open to question. I hooked up another server to fresno and was able to confirm that PURGE MASTER LOGS worked even when there is a slave server reading from it. So, my earlier theory is wrong, and Paul DuBois and Mark Leith's inability to repro makes sense. I also went through all the config differences between usa and fresno; there is only really one (query cache size). Changing fresno's query cache size to match usa's didn't make PURGE MASTER LOGS stop working on fresno. I was maintaining the log files on 'usa', the master, with a typical find|rm cron job. The usa-bin.index file was not being updated as a result, and was out of sync with the filesystem; its first entry is ./usa-bin.000001. This is also shown by SHOW MASTER LOGS showing files usa-bin.000001 through current, with zero size for the ones the cron job removed. When I run PURGE MASTER LOGS on usa, I get this output: mysql> purge master logs before date_sub(current_date, interval 7 day); Query OK, 0 rows affected (0.04 sec) In the log file, though: 070511 9:26:16 [ERROR] Failed to open log (file './usa-bin.000001', errno 2) Apparently it is looking through the contents of usa-bin.index, trying to remove a non-existent file, and stopping. I would propose that it should ignore files that don't exist, and update usa-bin.index to match what's on the filesystem. But I don't know all the ramifications of this. At least it should return an error to the client. I would also suggest that the manual for PURGE LOGS and the note on expire_logs_days be updated to document this behavior in versions where it's not fixed. It might be helpful to include a pointer to SHOW MASTER LOGS. And on SHOW MASTER LOGS, it would be good to note that if any files appear to have zero size, it is likely the log-bin.index file is out of date with the filesystem and purging will fail.
[14 May 2007 13:50]
Baron Schwartz
Actually, it looks like the error in the log file: 070511 9:26:16 [ERROR] Failed to open log (file './usa-bin.000001', errno 2) Is probably from another time. My mistake. But I hand-corrected usa-bin.index and was able to confirm that PURGE MASTER LOGS *does* work, as I said, so I still think I've found the bug. (There should probably be an error in the mysqld.err file when a file in usa-bin.index can't be found for purging, regardless).
[18 May 2007 20:09]
Mark Leith
Hi Baron, Indeed, there are issues with these statements when having binary logs manually removed, and not updating the binlog index file, which is why we always recommend to use PURGE MASTER LOGS or expire_logs_days exclusively. For this reason, I am actually going to verify this - but verify it as a feature request. I think the feature request should be: Update the binlog index file when PURGE MASTER LOGS or expire_logs_days encounters binary log references for non-existant binary log files. Would you agree with this?
[18 May 2007 20:12]
Baron Schwartz
Certainly. I would even say if it's not a high priority to change the behavior, just documenting it would be a big help.
[31 May 2007 6:54]
Valeriy Kravchuk
This is a reasonable feature/documentation request.
[26 Aug 2008 12:17]
James Green
This has lasted over a year now and we have run into it. Having just spent all morning tracking it down, could someone provide a progress update please?
[29 Dec 2008 16:28]
Colin Guthrie
Is there any way to recover this state once you get there? I've also run into the problem. In my case I had no option but to manually remove some old binary logs (the slow query log grew to such a size that it saturated the partition - so much so that you could not even connect to the mysql server to issue a "flush logs" command obviously removing the slow query log on disk didn't free the space as the running mysql still had the file handle open... only by removing an old log could I free up space to allow me to connect and thus flush the logs and thus free up the slow query file handle/disk space! phew!) I need to find a way to reset mysql's internal log file state (show master logs) with what is available on disk and preferably have it update mysql-bin.index again (althouhg why it appears to keep an internal state and a separate file is currently beyond me :) I'm using a 5.0.27 build just now due to bugs in the later ones relating to UDF handling - it's fixed in current 5.0.x releases but I've not had a chance to update this production environment yet.
[2 Oct 2009 9:35]
Domas Mituzas
I suggest re-triaging and re-classifying this as bug, as: 1. It can happen without manual intervention (race condition / fs issue / crash ) 2. There is no reporting, neither via errors, warnings or logs 3. Binlog accumulation because of this can lead to system failures (full file system and all the hell afterwards)
[5 Aug 2010 14:05]
Hartmut Holzgraefe
This seems to have been fixed in both current 5.0 and 5.1 releases, instead of failing to purge absent log files a warning is generated for every missing file now: | Warning | 1477 | Being purged log ./hp-laptop-bin.000001 was not found | and the purge continues. No idea which version this was fixed in though (a quick search for duplicate bugs and for related changelog entries did not come up with any results)
[5 Aug 2010 14:13]
Mark Leith
Looks like it was fixed with this commit: http://lists.mysql.com/commits/44122 ChangeSet@1.2598, 2008-03-17 18:03:47+02:00, aelkin@mysql1000.(none) +5 -0 Bug #18199 PURGE BINARY LOGS fails silently with missing logs; Bug #18453 Warning/error message if there is a mismatch between binary logs and index file