Bug #5096 MySQL 4.0.20 Relay log rotation problem (with workaround)
Submitted: 18 Aug 2004 17:20 Modified: 25 Aug 2004 11:48
Reporter: Joe Kislo Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:mysql-4.0.20-standard (Official MySQL-st OS:Linux (Linux 2.4.25)
Assigned to: CPU Architecture:Any

[18 Aug 2004 17:20] Joe Kislo
Description:
	We recently upgraded from 3.23.50(binary), to 4.0.20 (binary).  We have a single master, and multiple slaves. After the upgrade, the slaves started creating these relay files for replication.  These files, started building up and were never removed.  This problem occoured on both slaves.  In the MySQL manual it clearly says these relay files  should be expunged as soon as they are not needed:

Relay logs have the same format as binary logs, so you can use mysqlbinlog to read them. A relay log is automatically deleted by the SQL thread as soon as it has executed all its events and no longer needs it). There is no explicit mechanism for deleting relay logs, because the SQL thread takes care of doing so. However, from MySQL 4.0.14, FLUSH LOGS rotates relay logs, which will influence when the SQL thread deletes them. 

However ours were not getting deleted, ever.  This was further confused by a feature only available
in MySQL 4.1.1 that allows you to toggle the deleting of relay logs (the feature in 4.1.1 is just the option to toggle it off, not the automatic deletion feature).  

Our Slave was up to date with the master, both confirmed by show slave status, and separately confirming changes on the master were being replicated down.  Here's a show slave status from one of the slaves:

mysql> show slave status\G
*************************** 1. row ***************************
          Master_Host: xx.xx.xx.xx
          Master_User: xxxxx
          Master_Port: 3306
        Connect_retry: 60
      Master_Log_File: mysql-bin.009
  Read_Master_Log_Pos: 270011564
       Relay_Log_File: xxxxxx-1-relay-bin.018
        Relay_Log_Pos: 87921
Relay_Master_Log_File: mysql-bin.009
     Slave_IO_Running: No
    Slave_SQL_Running: No
      Replicate_do_db: 
  Replicate_ignore_db: 
           Last_errno: 0
           Last_error: 
         Skip_counter: 0
  Exec_master_log_pos: 270011564
      Relay_log_space: 1615315549
1 row in set (0.00 sec)

On the disk, there are relay logs 001-018.  None of them get deleted, ever.  Flush logs, creates a new relay log, but does not delete the old ones.  You'll note the total relay log space is all of the files combined, 1.6G.

After posting on the mysql list and not hearing back, I assumed it must be some wierd case, which I have identified below:

How to repeat:
	Apparently the existance of multiple relay file is the key here.  Somehow I ended up getting multiple relay files created when I did the conversion, I'm not sure how I did this (EG: 001 and 002 were 4 bytes).  Apparently if MySQL's SQL thread is reading happily on relay file 003, and 002 and 001 exist, when it creates 004 it will not  delete 003 (or 002 or 001).  I determine this by the workaround below:

Suggested fix:
	I did a flush logs on the replication server and shutdown the slave.  I moved away all the relay logs except the newest one, and updated the relay.index file to only reference the newest one.  I started mysql, and it deleted the newest relay log and created a new one.  Flush logs now works, and it deletes relay files as documented.

This is either:
 1) A software bug (in that it should delete all the unneeded relay logs)
 2) A documentation bug (if this is considered a 'feature', then we should put something into the manual, because this is not anywhere on google, the mailing list or documentation.  And it's REALLY confusing)

Thanks,

-Joe

>Submitter-Id:	<submitter ID>
>Originator: Joe Kislo
>Organization:  Athenium Inc.
>MySQL support: none
>Synopsis:	Was a very serious problem (running out of disk), now the problem is undercontrol w/ workaround
>Severity:	non-critical
>Priority:	medium
>Category:	mysql
>Class:		sw-bug | doc-bug
>Release:	mysql-4.0.20-standard (Official MySQL-standard binary)

>C compiler:    2.95.3
>C++ compiler:  2.95.3
>Environment:
	<machine, os, target, libraries (multiple lines)>
System: Linux xxxx 2.4.25-athenium-686-smp #3 SMP Fri Mar 19 18:29:28 EST 2004 i686 unknown
Architecture: i686

Some paths:  /usr/bin/perl /usr/bin/make /usr/bin/gcc /usr/bin/cc
GCC: Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011002 (Debian prerelease)
Compilation info: CC='gcc'  CFLAGS='-O2 -mcpu=pentiumpro'  CXX='gcc'  CXXFLAGS='-O2 -mcpu=pentiumpro -felide-constructors'  LDFLAGS=''  ASFLAGS=''
LIBC: 
lrwxrwxrwx    1 root     root           13 Feb 23 14:21 /lib/libc.so.6 -> libc-2.2.5.so
-rwxr-xr-x    1 root     root      1153784 Apr  8  2003 /lib/libc-2.2.5.so
-rw-r--r--    1 root     root      2391002 Apr  8  2003 /usr/lib/libc.a
-rw-r--r--    1 root     root          178 Apr  8  2003 /usr/lib/libc.so
Configure command: ./configure '--prefix=/usr/local/mysql' '--localstatedir=/usr/local/mysql/data' '--libexecdir=/usr/local/mysql/bin' '--with-comment=Official MySQL-standard binary' '--with-extra-charsets=complex' '--with-server-suffix=-standard' '--enable-thread-safe-client' '--enable-local-infile' '--enable-assembler' '--disable-shared' '--with-client-ldflags=-all-static' '--with-mysqld-ldflags=-all-static' '--with-embedded-server' '--with-innodb' 'CFLAGS=-O2 -mcpu=pentiumpro' 'CXXFLAGS=-O2 -mcpu=pentiumpro -felide-constructors' 'CXX=gcc'
[25 Aug 2004 11:48] Guilhem Bichot
Hi Joe,
You are right when you write:
"Apparently the existance of multiple relay file is the key here.  Somehow
I ended up getting multiple relay files created when I did the conversion,
I'm not sure how I did this (EG: 001 and 002 were 4 bytes).  Apparently if
MySQL's SQL thread is reading happily on relay file 003, and 002 and 001
exist, when it creates 004 it will not  delete 003 (or 002 or 001)."
It is a characteristic of MySQL 4.0, which we removed in MySQL 4.1 (finding it confusing like you did). In MySQL 4.1, relay logs are always deleted unless:
- you forbid this, by doing SET GLOBAL RELAY_LOG_PURGE=0; (and you can then re-allow it by setting to 1 - the setting will take effect the next time the slave SQL thread moves to a new relay log).
- you use CHANGE MASTER TO RELAY_LOG_FILE=...; command (because in that case, it usually means the user does not have access to the master's binlogs anymore, so the relay logs are critical files which should be deleted only later). 
Until the upgrade, the only workaround is the one you did.