| Bug #67149 | Slave writing a ton of nearly-empty relay logs and never being able to catch up | ||
|---|---|---|---|
| Submitted: | 9 Oct 2012 17:01 | Modified: | 24 Jan 2013 17:39 | 
| Reporter: | Michael Finch | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) | 
| Version: | mysql55-5.5.27-2 | OS: | Linux (centos6.3) | 
| Assigned to: | Matthew Lord | CPU Architecture: | Any | 
| Tags: | catch up, empty, relay-log, replication, slow | ||
   [11 Oct 2012 18:48]
   Sveta Smirnova        
  Thank you for the report. > Slave should be able to catch up to master. It seems strange that the slave writes a repeatable pattern of 4k relay-log, 4k relay-log, 1.1gb relay-log. Is it possible that you send large packet from master? Like a query of 1G size or multiple-queries transaction of same size. Please also provide your slave configuration file.
   [12 Oct 2012 18:58]
   Michael Finch        
  > Is it possible that you send large packet from master? Like a query of 1G size or multiple-queries transaction of same size. I don't think so because sometimes only 1 slave of many experiences this. All slaves have this configuration (except server-id): ## _______________________________________________________________________ ## / Rackspace MySQL 5.5 Terse Configuration File \ ## | | ## | This is a base configuration file containing the most frequently used | ## | settings with reasonably defined default values for configuring and | ## | tuning MySQL. Note that these settings can likely be further tuned in | ## | order to get optimum performance from MySQL based upon the database | ## | configuration and hardware platform. | ## | | ## | While the settings provided are likely sufficient for most | ## | situations, an exhaustive list of settings (with descriptions) can be | ## | found at: | ## | http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html | ## | | ## | Take care to only add/remove/change a setting if you are comfortable | ## | doing so! For Rackspace customers, if you have any questions or | ## | concerns, please contact the MySQL Database Services Team. Be aware | ## | that some work performed by this team can involve additional billable | ## \ fees. / ## ----------------------------------------------------------------------- ## \ ^__^ ## \ (oo)\_______ ## (__)\ )\/\ ## ||----w | ## || || [mysql] no-auto-rehash # MNT based socket = /mnt/mysql/mysql.sock # STANDARD based #socket = /var/lib/mysql/mysql.sock [mysqld] ## General #datadir = /var/lib/mysql #tmpdir = /var/lib/mysqltmp #socket = /var/lib/mysql/mysql.sock # BE Override skip-name-resolve sql-mode = NO_ENGINE_SUBSTITUTION #event-scheduler = 1 ########################################## START OF OVERRIDES ###################################################### # MNT based datadir = /mnt/mysql tmpdir = /mnt/mysql/tmp socket = /mnt/mysql/mysql.sock general-log = 0 log-error = /mnt/mysql/log/mysqld.log slow-query-log-file = /mnt/mysql/log/slow.log log-bin = /mnt/mysql/log/bin-log relay-log = /mnt/mysql/log/relay-log # STANDARD based #datadir = /var/lib/mysql #tmpdir = /var/lib/mysqltmp #socket = /var/lib/mysql/mysql.sock #general-log = 0 #log-error = /var/log/mysqld.log #slow-query-log-file = /var/lib/mysqllogs/slow-log #log-bin = /var/lib/mysqllogs/bin-log #relay-log = /var/lib/mysqllogs/relay-log ##### REPLICATION server-id = 7 # put the next line on the slave to allow it to be master to other slaves log-slave-updates = 1 ## tables that should be ignore by slave #### removed # performance tweaks max-connect-errors = 10000 max_connections = 300 open-files-limit = 131072 max-allowed-packet = 200M table_cache = 256 default-storage-engine = MyISAM #read_buffer_size = 1M #read_rnd_buffer_size = 256K #myisam_sort_buffer_size = 300M #join_buffer_size=500M # in memory operation #max_heap_table_size = 1G # tmp table size limit #tmp_table_size = 5G #thread_cache_size = 8 query-cache-type = 1 query-cache-size = 16M # Try number of CPU's*2 for thread_concurrency thread_concurrency = 16 # innodb tweaks innodb-open-files = 16384 innodb-lock-wait-timeout = 500 # innodb-buffer-pool-size increase number for locks innodb-buffer-pool-size = 256M innodb-additional-mem-pool-size = 20M innodb-log-buffer-size = 4M innodb-thread-concurrency = 16 # put each innodb data in its own file per table # not that transactions and dict are still in shared data file # http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html innodb-file-per-table = 1 slow-query-log = 1 long-query-time = 3 ########################################## END OF OVERRIDES ###################################################### ## Cache thread-cache-size = 64 table-open-cache = 4096 table-definition-cache = 4096 #query-cache-size = 32M #query-cache-limit = 1M ## Per-thread Buffers sort-buffer-size = 1M read-buffer-size = 1M read-rnd-buffer-size = 1M join-buffer-size = 1M ## Temp Tables tmp-table-size = 32M max-heap-table-size = 64M ## Networking back-log = 100 #max-connections = 200 #max-connect-errors = 10000 #max-allowed-packet = 16M interactive-timeout = 3600 #wait-timeout = 600 ### Storage Engines #default-storage-engine = InnoDB #innodb = FORCE default-storage-engine = MyISAM ## MyISAM key-buffer-size = 64M myisam-sort-buffer-size = 128M ## InnoDB #innodb-buffer-pool-size = 128M #innodb-log-file-size = 100M #innodb-log-buffer-size = 8M #innodb-file-per-table = 1 #innodb-open-files = 300 ## Replication #server-id = 1 #log-bin = /var/lib/mysqllogs/bin-log #relay-log = /var/lib/mysqllogs/relay-log relay-log-space-limit = 16G expire-logs-days = 7 #read-only = 1 #sync-binlog = 1 #log-slave-updates = 1 #binlog-format = STATEMENT #auto-increment-offset = 1 #auto-increment-increment = 2 ## Logging log-output = FILE #slow-query-log = 1 #slow-query-log-file = /var/lib/mysqllogs/slow-log # BE override log-slow-slave-statements #long-query-time = 2 [mysqld_safe] #log-error = /var/log/mysqld.log # BE Overrides #open-files-limit = 65535
   [24 Jan 2013 17:39]
   Matthew Lord        
  We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not the clearly the result of a verifiable/repeatable bug. The symptom you describe would be caused by the slave trying to repeatedly read a binary log event from the master and failing. Tracking this down could easily be done via the proper methods, but that does not include a bug report. Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/ Thank you for your interest in MySQL.


Description: Hi, We have been running into issues where slaves (which normally replicate just fine) start writing >100 relay-logs - most of which are 4.0k in size - and are never able to get through all of them. Here's an example of the log directory of one of these slaves: bin-log.000001 bin-log.000002 bin-log.index mysqld.log relay-log.000003 relay-log.000004 relay-log.000005 relay-log.000006 relay-log.000007 relay-log.000008 ..... cut for space .... relay-log.000367 relay-log.000368 relay-log.index slow.log Here is the size/creation time of the relay-logs (this pattern repeats for all of them): -rw-rw---- 1 mysql mysql 191 Oct 8 10:55 relay-log.000003 -rw-rw---- 1 mysql mysql 1073768722 Oct 8 11:07 relay-log.000004 -rw-rw---- 1 mysql mysql 328 Oct 8 11:07 relay-log.000005 -rw-rw---- 1 mysql mysql 191 Oct 8 11:07 relay-log.000006 -rw-rw---- 1 mysql mysql 1073741881 Oct 8 11:20 relay-log.000007 -rw-rw---- 1 mysql mysql 832 Oct 8 11:20 relay-log.000008 -rw-rw---- 1 mysql mysql 191 Oct 8 11:20 relay-log.000009 -rw-rw---- 1 mysql mysql 1073967803 Oct 8 11:33 relay-log.000010 -rw-rw---- 1 mysql mysql 328 Oct 8 11:33 relay-log.000011 -rw-rw---- 1 mysql mysql 191 Oct 8 11:33 relay-log.000012 Here are the contents of relay-log.000003, which has been there for a long time: /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #121008 10:51:43 server id 202 end_log_pos 107 Start: binlog v 4, server v 5.5.27-log created 121008 10:51:43 BINLOG ' rxJzUA/KAAAAZwAAAGsAAAAAAAQANS41LjI3LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA== '/*!*/; # at 107 #121003 3:19:20 server id 4 end_log_pos 1073741965 Rotate to bin-log.001392 pos: 4 # at 148 #121008 10:51:43 server id 202 end_log_pos 191 Rotate to relay-log.000004 pos: 4 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; Here is the output of show slave status\G: mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: master.test.com Master_User: replication Master_Port: 3306 Connect_Retry: 60 Master_Log_File: bin-log.001514 Read_Master_Log_Pos: 834819956 Relay_Log_File: relay-log.000004 Relay_Log_Pos: 683731071 Relay_Master_Log_File: bin-log.001392 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 683730927 Relay_Log_Space: 131028696502 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 533889 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 4 1 row in set (0.00 sec) Lastly, show processlist does not show anything running (even if I keep spamming it). How to repeat: Sometimes we can produce this on all of a master's slaves by running a very long (>3 hour) update or insert. Sometimes, as in the description above, this will just happen to one of the master's slaves apparently out of nowhere. Other times, we see this when we start replication on a slave that is maybe 20 or more bin-logs behind master when we start replication When we run into the issue due to a long query, what we do is stop the slave, point it to the next bin-log position on master, and then manually run the query directly on each slave, which fixes the problem in that case. For the scenario above, though, doing this doesn't accomplish anything...the slave just starts writing a ton of binlogs again Suggested fix: Slave should be able to catch up to master. It seems strange that the slave writes a repeatable pattern of 4k relay-log, 4k relay-log, 1.1gb relay-log.