MySQL Bugs: #67149: Slave writing a ton of nearly-empty relay logs and never being able to catch up

Bug #67149	Slave writing a ton of nearly-empty relay logs and never being able to catch up
Submitted:	9 Oct 2012 17:01	Modified:	24 Jan 2013 17:39
Reporter:	Michael Finch	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S1 (Critical)
Version:	mysql55-5.5.27-2	OS:	Linux (centos6.3)
Assigned to:	Matthew Lord	CPU Architecture:	Any
Tags:	catch up, empty, relay-log, replication, slow

Description:
Hi,

We have been running into issues where slaves (which normally replicate just fine) start writing >100 relay-logs - most of which are 4.0k in size - and are never able to get through all of them. Here's an example of the log directory of one of these slaves:

bin-log.000001  
bin-log.000002  
bin-log.index   
mysqld.log      
relay-log.000003
relay-log.000004
relay-log.000005
relay-log.000006
relay-log.000007
relay-log.000008
.....
cut for space
....
relay-log.000367
relay-log.000368
relay-log.index
slow.log

Here is the size/creation time of the relay-logs (this pattern repeats for all of them):

-rw-rw----  1 mysql mysql        191 Oct  8 10:55 relay-log.000003
-rw-rw----  1 mysql mysql 1073768722 Oct  8 11:07 relay-log.000004
-rw-rw----  1 mysql mysql        328 Oct  8 11:07 relay-log.000005
-rw-rw----  1 mysql mysql        191 Oct  8 11:07 relay-log.000006
-rw-rw----  1 mysql mysql 1073741881 Oct  8 11:20 relay-log.000007
-rw-rw----  1 mysql mysql        832 Oct  8 11:20 relay-log.000008
-rw-rw----  1 mysql mysql        191 Oct  8 11:20 relay-log.000009
-rw-rw----  1 mysql mysql 1073967803 Oct  8 11:33 relay-log.000010
-rw-rw----  1 mysql mysql        328 Oct  8 11:33 relay-log.000011
-rw-rw----  1 mysql mysql        191 Oct  8 11:33 relay-log.000012

Here are the contents of relay-log.000003, which has been there for a long time:

/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#121008 10:51:43 server id 202  end_log_pos 107         Start: binlog v 4, server v 5.5.27-log created 121008 10:51:43
BINLOG '
rxJzUA/KAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 107
#121003  3:19:20 server id 4  end_log_pos 1073741965    Rotate to bin-log.001392  pos: 4
# at 148
#121008 10:51:43 server id 202  end_log_pos 191         Rotate to relay-log.000004  pos: 4
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

Here is the output of show slave status\G:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master.test.com
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: bin-log.001514
          Read_Master_Log_Pos: 834819956
               Relay_Log_File: relay-log.000004
                Relay_Log_Pos: 683731071
        Relay_Master_Log_File: bin-log.001392
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 683730927
              Relay_Log_Space: 131028696502
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 533889
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 4
1 row in set (0.00 sec)

Lastly, show processlist does not show anything running (even if I keep spamming it).

How to repeat:
Sometimes we can produce this on all of a master's slaves by running a very long (>3 hour) update or insert. Sometimes, as in the description above, this will just happen to one of the master's slaves apparently out of nowhere. Other times, we see this when we start replication on a slave that is maybe 20 or more bin-logs behind master when we start replication

When we run into the issue due to a long query, what we do is stop the slave, point it to the next bin-log position on master, and then manually run the query directly on each slave, which fixes the problem in that case. For the scenario above, though, doing this doesn't accomplish anything...the slave just starts writing a ton of binlogs again

Suggested fix:
Slave should be able to catch up to master. It seems strange that the slave writes a repeatable pattern of 4k relay-log, 4k relay-log, 1.1gb relay-log.

Thank you for the report.

> Slave should be able to catch up to master. It seems strange that the slave writes a repeatable pattern of 4k relay-log, 4k relay-log, 1.1gb relay-log.

Is it possible that you send large packet from master? Like a query of 1G size or multiple-queries transaction of same size.

Please also provide your slave configuration file.

> Is it possible that you send large packet from master? Like a query of 1G size or multiple-queries transaction of same size.

I don't think so because sometimes only 1 slave of many experiences this.

All slaves have this configuration (except server-id):

##  _______________________________________________________________________ 
## / Rackspace MySQL 5.5 Terse Configuration File                          \
## |                                                                       |
## | This is a base configuration file containing the most frequently used |
## | settings with reasonably defined default values for configuring and   |
## | tuning MySQL. Note that these settings can likely be further tuned in |
## | order to get optimum performance from MySQL based upon the database   |
## | configuration and hardware platform.                                  |
## |                                                                       |
## | While the settings provided are likely sufficient for most            |
## | situations, an exhaustive list of settings (with descriptions) can be |
## | found at:                                                             |
## | http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html   |
## |                                                                       |
## | Take care to only add/remove/change a setting if you are comfortable  |
## | doing so! For Rackspace customers, if you have any questions or       |
## | concerns, please contact the MySQL Database Services Team. Be aware   |
## | that some work performed by this team can involve additional billable |
## \ fees.                                                                 /
##  ----------------------------------------------------------------------- 
##         \   ^__^
##          \  (oo)\_______
##             (__)\       )\/\
##                 ||----w |
##                 ||     ||

[mysql]
no-auto-rehash
# MNT based
socket                          = /mnt/mysql/mysql.sock

# STANDARD based
#socket                          = /var/lib/mysql/mysql.sock

[mysqld]
## General
#datadir                         = /var/lib/mysql
#tmpdir                          = /var/lib/mysqltmp
#socket                          = /var/lib/mysql/mysql.sock # BE Override

skip-name-resolve
sql-mode                        = NO_ENGINE_SUBSTITUTION
#event-scheduler                = 1

########################################## START OF OVERRIDES ######################################################
# MNT based
datadir                          = /mnt/mysql
tmpdir                           = /mnt/mysql/tmp
socket                           = /mnt/mysql/mysql.sock
general-log                      = 0
log-error                        = /mnt/mysql/log/mysqld.log
slow-query-log-file              = /mnt/mysql/log/slow.log
log-bin                          = /mnt/mysql/log/bin-log
relay-log                        = /mnt/mysql/log/relay-log

# STANDARD based
#datadir                         = /var/lib/mysql
#tmpdir                          = /var/lib/mysqltmp
#socket                          = /var/lib/mysql/mysql.sock
#general-log                     = 0
#log-error                       = /var/log/mysqld.log
#slow-query-log-file             = /var/lib/mysqllogs/slow-log
#log-bin                         = /var/lib/mysqllogs/bin-log
#relay-log                       = /var/lib/mysqllogs/relay-log

##### REPLICATION
server-id                       = 7
# put the next line on the slave to allow it to be master to other slaves
log-slave-updates               = 1

## tables that should be ignore by slave
#### removed

# performance tweaks
max-connect-errors              = 10000
max_connections                 = 300
open-files-limit                = 131072
max-allowed-packet              = 200M
table_cache = 256
default-storage-engine          = MyISAM
#read_buffer_size = 1M
#read_rnd_buffer_size = 256K
#myisam_sort_buffer_size = 300M
#join_buffer_size=500M
# in memory operation
#max_heap_table_size = 1G
# tmp table size limit
#tmp_table_size = 5G
#thread_cache_size = 8
query-cache-type                = 1
query-cache-size                = 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 16

# innodb tweaks
innodb-open-files               = 16384
innodb-lock-wait-timeout        = 500

# innodb-buffer-pool-size increase number for locks
innodb-buffer-pool-size         = 256M
innodb-additional-mem-pool-size = 20M
innodb-log-buffer-size          = 4M
innodb-thread-concurrency       = 16

# put each innodb data in its own file per table
# not that transactions and dict are still in shared data file
# http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html
innodb-file-per-table           = 1

slow-query-log = 1
long-query-time = 3

########################################## END OF OVERRIDES ######################################################

## Cache
thread-cache-size               = 64
table-open-cache                = 4096
table-definition-cache          = 4096
#query-cache-size                = 32M
#query-cache-limit               = 1M

## Per-thread Buffers
sort-buffer-size                = 1M
read-buffer-size                = 1M
read-rnd-buffer-size            = 1M
join-buffer-size                = 1M

## Temp Tables
tmp-table-size                  = 32M 
max-heap-table-size             = 64M 

## Networking
back-log                        = 100
#max-connections                = 200
#max-connect-errors              = 10000
#max-allowed-packet              = 16M
interactive-timeout             = 3600
#wait-timeout                    = 600

### Storage Engines
#default-storage-engine         = InnoDB
#innodb                          = FORCE
default-storage-engine          = MyISAM

## MyISAM
key-buffer-size                 = 64M 
myisam-sort-buffer-size         = 128M

## InnoDB
#innodb-buffer-pool-size        = 128M
#innodb-log-file-size           = 100M
#innodb-log-buffer-size         = 8M
#innodb-file-per-table          = 1
#innodb-open-files              = 300

## Replication
#server-id                       = 1
#log-bin                        = /var/lib/mysqllogs/bin-log
#relay-log                      = /var/lib/mysqllogs/relay-log
relay-log-space-limit           = 16G
expire-logs-days                = 7
#read-only                      = 1
#sync-binlog                    = 1
#log-slave-updates              = 1
#binlog-format                  = STATEMENT
#auto-increment-offset          = 1
#auto-increment-increment       = 2

## Logging
log-output                      = FILE
#slow-query-log                  = 1
#slow-query-log-file             = /var/lib/mysqllogs/slow-log # BE override
log-slow-slave-statements
#long-query-time                 = 2

[mysqld_safe]
#log-error                       = /var/log/mysqld.log # BE Overrides
#open-files-limit                = 65535

We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the clearly the result of a verifiable/repeatable bug.

The symptom you describe would be caused by the slave trying to repeatedly read a binary log event from the master and failing. Tracking this down could easily be done via the proper methods, but that does not include a bug report.

Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.