Bug #15019 binlog purge fails if more binlogs open then O/S limits allow
Submitted: 17 Nov 2005 14:56 Modified: 24 Oct 2007 14:09
Reporter: Jonathan Miller Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Cluster: Replication Severity:S2 (Serious)
Version:5.1.2 OS:Linux (Linux)
Assigned to: Magnus Blåudd CPU Architecture:Any

[17 Nov 2005 14:56] Jonathan Miller
Description:
Example:

You have over 1300 binlogs and start a purge operation. The operation tries to open more files then is allowed by the OS, and the purge shows to have completed, but actually fails about 3/4 of the way through.

With the following in the error log:
051116 16:19:38 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Can't find file: '.
/cluster_replication/binlog_index.frm' (errno: 24)
051116 16:19:38 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Can't find file: '.
/cluster_replication/binlog_index.frm' (errno: 24)
051116 16:24:51 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Lock wait timeout e
xceeded; try restarting transaction
051116 16:24:51 [ERROR] /home/ndbdev/jmiller/builds/libexec/mysqld: Sort aborted

How to repeat:
1) Set you binlog size low (20K or so)
2) Start several application causing binlogs to grow and new ones to be created.
3) Once you have > 1000 binlogs start a purge operation.
4) you will get back error 24 (OS error code  24:  Too many open files)
[2 Dec 2005 8:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/internals/32950
[19 May 2006 21:29] Jonathan Miller
Tomas this should be a replication issue. What is your take?
[23 Oct 2007 21:12] Jonathan Miller
Hi Magnus,

1) max_binlog_size
max_binlog_size

If a write to the binary log causes the current log file size to exceed the value of this variable, the server rotates the binary logs (closes the current file and opens the next one). You cannot set this variable to more than 1GB or to less than 4096 bytes. The default value is 1GB.

A transaction is written in one chunk to the binary log, so it is never split between several binary logs. Therefore, if you have big transactions, you might see binary logs larger than max_binlog_size. 

2) Sure, anything that makes the bin logs grow and rotate to new ones.

3) PURGE {MASTER | BINARY} LOGS TO 'log_name'
   PURGE {MASTER | BINARY} LOGS BEFORE 'date'
   
http://dev.mysql.com/doc/refman/5.1/en/purge-master-logs.html

Yes, a script could start the purge.

4) The mysqld (master) error log.

Best wishes,
/Jeb
[24 Oct 2007 14:09] Magnus Blåudd
--source include/have_log_bin.inc

select @@max_binlog_size;
set @@GLOBAL.max_binlog_size=4096; # Minimum value
select @@max_binlog_size;

CREATE TABLE t1 (a int, b varchar(25), c int(63), d varchar(1), e char(33));

--disable_query_log
let $i= 5000;
while ($i){
  INSERT INTO t1 VALUES
  ( 1, "ehlo", 1, "k", "column_e"),
  ( 2, "ehlo", 1, "k", "column_e"),
  ( 3, "ehlo", 1, "k", "column_e"),
  ( 4, "ehlo", 1, "k", "column_e"),
  ( 5, "ehlo", 1, "k", "column_e"),
  ( 6, "ehlo", 1, "k", "column_e"),
  ( 7, "ehlo", 1, "k", "column_e"),
  ( 8, "ehlo", 1, "k", "column_e"),
  ( 9, "ehlo", 1, "k", "column_e"),
  ( 10, "ehlo", 1, "k", "column_e"),
  ( 11, "ehlo", 1, "k", "column_e"),
  ( 12, "ehlo", 1, "k", "column_e"),
  ( 13, "ehlo", 1, "k", "column_e"),
  ( 14, "ehlo", 1, "k", "column_e"),
  ( 15, "ehlo", 1, "k", "column_e"),
  ( 16, "ehlo", 1, "k", "column_e"),
  ( 17, "ehlo", 1, "k", "column_e"),
  ( 18, "ehlo", 1, "k", "column_e"),
  ( 19, "ehlo", 1, "k", "column_e"),
  ( 20, "ehlo", 1, "k", "column_e"),
  ( 21, "ehlo", 1, "k", "column_e"),
  ( 22, "ehlo", 1, "k", "column_e"),
  ( 23, "ehlo", 1, "k", "column_e"),
  ( 24, "ehlo", 1, "k", "column_e"),
  ( 25, "ehlo", 1, "k", "column_e"),
  ( 26, "ehlo", 1, "k", "column_e"),
  ( 27, "ehlo", 1, "k", "column_e"),
  ( 28, "ehlo", 1, "k", "column_e"),
  ( 29, "ehlo", 1, "k", "column_e"),
  ( 30, "ehlo", 1, "k", "column_e"),
  ( 31, "ehlo", 1, "k", "column_e"),
  ( 32, "ehlo", 1, "k", "column_e"),
  ( 33, "ehlo", 1, "k", "column_e"),
  ( 34, "ehlo", 1, "k", "column_e"),
  ( 35, "ehlo", 1, "k", "column_e"),
  ( 36, "ehlo", 1, "k", "column_e"),
  ( 37, "ehlo", 1, "k", "column_e"),
  ( 38, "ehlo", 1, "k", "column_e"),
  ( 39, "ehlo", 1, "k", "column_e"),
  ( 40, "ehlo", 1, "k", "column_e"),
  ( 41, "ehlo", 1, "k", "column_e"),
  ( 42, "ehlo", 1, "k", "column_e"),
  ( 43, "ehlo", 1, "k", "column_e"),
  ( 44, "ehlo", 1, "k", "column_e"),
  ( 45, "ehlo", 1, "k", "column_e"),
  ( 46, "ehlo", 1, "k", "column_e"),
  ( 47, "ehlo", 1, "k", "column_e"),
  ( 48, "ehlo", 1, "k", "column_e"),
  ( 49, "ehlo", 1, "k", "column_e"),
  ( 50, "ehlo", 1, "k", "column_e");

  dec $i;
}
--enable_query_log
SELECT count(*) FROM t1;

SHOW BINARY LOGS;

# Purge 1666 binry logs at the same time,
# that is more than the ulimit for "open files"
# which is set to 1024 on my machine
PURGE BINARY LOGS TO 'master-bin.001667';

# All log except the one mentioned in PURGE above
# should now be gone
SHOW BINARY LOGS;

# There should be no "[ERROR] Can't find file x (errno: 24)"
# in var/log/master.err

DROP TABLE t1;