Bug #28421 | Infinite loop on slave relay logs | ||
---|---|---|---|
Submitted: | 14 May 2007 16:56 | Modified: | 12 Mar 2010 17:55 |
Reporter: | Baron Schwartz (Basic Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.38, 5.1.40 | OS: | Linux (Gentoo) |
Assigned to: | Libing Song | CPU Architecture: | Any |
Tags: | binlog, qc, relay log, replication, rotate |
[14 May 2007 16:56]
Baron Schwartz
[14 May 2007 16:59]
Baron Schwartz
Another possibly related detail: since all three of this master's slaves behaved the same way, it suggests it's a problem coming from the master. The only unusual thing that happened today on the master is I edited usa-bin.index and removed files that had been rm-ed from the disk, then ran PURGE MASTER LOGS on the master (see bug #28238). I can give more details on this if it seems relevant.
[17 Jun 2007 17:43]
Baron Schwartz
This has happened again, this time to fresno's slave (nepal used to be a slave of usa, but I chained it off fresno for various reasons): -rw-rw---- 1 mysql mysql 1073741938 Jun 16 04:10 nepal-relay-bin.000319 -rw-rw---- 1 mysql mysql 475958 Jun 16 04:10 nepal-relay-bin.000320 -rw-rw---- 1 mysql mysql 191 Jun 16 04:10 nepal-relay-bin.000321 -rw-rw---- 1 mysql mysql 1073741938 Jun 16 04:12 nepal-relay-bin.000322 -rw-rw---- 1 mysql mysql 475958 Jun 16 04:12 nepal-relay-bin.000323 -rw-rw---- 1 mysql mysql 191 Jun 16 04:12 nepal-relay-bin.000324 ... snip many lines, each about 2-3 minutes apart... -rw-rw---- 1 mysql mysql 191 Jun 16 06:07 nepal-relay-bin.000543 -rw-rw---- 1 mysql mysql 1073741938 Jun 16 06:08 nepal-relay-bin.000544 -rw-rw---- 1 mysql mysql 475958 Jun 16 06:08 nepal-relay-bin.000545 -rw-rw---- 1 mysql mysql 191 Jun 16 06:08 nepal-relay-bin.000546 The output of SHOW SLAVE STATUS reports the SQL thread is stopped with a duplicate key error, because the relay log has already executed events. The events are duplicated in the relay logs. The relay logs just kept getting fetched and written, as you can see above, until the disk filled up. I consider this a very serious bug. Has anyone heard of this happening to anyone else? Surely it's not just me.
[18 Jun 2007 21:41]
Baron Schwartz
More information on the most recent occurrence: on the master, SHOW MASTER STATUS returns nothing, and SHOW MASTER LOGS says binary logging is not enabled (though it clearly is). In the server's error log, 070616 4:09:00 [ERROR] Could not use fresno-bin for logging (error 17). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it. Error 17 is "file exists" according to perror. It almost looks like the master filled log X, got done and wrote a final "next, go to log X" into the binlog, then tried to go open log X itself and failed because it's the one it was just using. Then the slave fetched log X, hit the end, fetched log X again... In this case, "log X" is /var/lib/mysql/fresno-bin.000106. The end of the log looks okay to me, though: # at 1074217468 #070616 4:09:00 server id 40 end_log_pos 1074217512 Rotate to fresno-bin.000107 pos: 4 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
[22 Jun 2007 9:54]
Sveta Smirnova
Thank you for the report. Please provide configuration files for master and slave.
[22 Jun 2007 11:51]
Baron Schwartz
Slave (nepal.cnf) and master (fresno.cnf) are attached. If you need to see the binlog mentioned (.....0106) I have saved it too.
[22 Jun 2007 12:08]
Sveta Smirnova
Thank you for the feedback. Please attach the binary log too.
[22 Jun 2007 13:47]
Baron Schwartz
Binlog uploaded via FTP as bug-data-28421.tar.gz.
[2 Aug 2008 22:34]
Dave Rose
I'd just like to add that I have seen this 2 times myself. We are running 5.0.41 Interestingly enough we were able to restore one of our slaves to a database snapshot that was minutes before it happened and restart the slave, but it did not happen the second time. To me it implies that this happens when the master rotates the binlogs and the slave is gathering statements for its relay-bin.log Unfortunately it was not reproducible on the same machine with the same dataset :(
[7 Oct 2008 9:14]
Susanne Ebrecht
Many thanks for writing a bug report. Unfortunately we can't repeat this. Please try newest version of MySQL (5.0.67).
[9 Nov 2009 21:51]
Adam Dixon
I have repeated this somewhat (slave continuously creating relay log files). This test shows it with very small relay logs only but likely the same root cause. Starting with master -> slave hosts replicating. # get some binlogs in the .index file master> flush logs; flush logs; rm master-bin.000002 master> flush logs; # edit master-bin.index file, remove 000002, and flush logs again, and slave relaylogs repeat over and over; (rapidly to fill disk or until stop slave etc) vi master-bin.index (delete the line with 000002 and save) master> flush logs; 3-500 per second created; master-bin.000004 contains only; [root@dixon var]# mysqlbinlog master-bin.000004 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #091110 8:10:33 server id 1 end_log_pos 106 Start: binlog v 4, server v 5.1.40-enterprise-gpl-advanced-log created 091110 8:10:33 # at 106 #091110 8:11:16 server id 1 end_log_pos 150 Rotate to master-bin.000005 pos: 4 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; The last two relay logs contain this; [root@dixon var1]# mysqlbinlog dixon-relay-bin.001775 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #091110 8:12:28 server id 2 end_log_pos 106 Start: binlog v 4, server v 5.1.40-enterprise-gpl-advanced-log created 091110 8:12:28 # at 106 #700101 9:30:00 server id 1 end_log_pos 0 Rotate to master-bin.000005 pos: 106 # at 150 #091110 8:11:16 server id 1 end_log_pos 0 Start: binlog v 4, server v 5.1.40-enterprise-gpl-advanced-log created 091110 8:11:16 # at 252 #091110 8:12:38 server id 2 end_log_pos 301 Rotate to dixon-relay-bin.001776 pos: 4 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; [root@dixon var1]# mysqlbinlog dixon-relay-bin.001776 /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #091110 8:12:38 server id 2 end_log_pos 106 Start: binlog v 4, server v 5.1.40-enterprise-gpl-advanced-log created 091110 8:12:38 # at 106 #700101 9:30:00 server id 1 end_log_pos 0 Rotate to master-bin.000005 pos: 106 # at 150 #091110 8:11:16 server id 1 end_log_pos 0 Start: binlog v 4, server v 5.1.40-enterprise-gpl-advanced-log created 091110 8:11:16 DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
[9 Nov 2009 21:56]
Adam Dixon
# cat master-bin.index ./master-bin.000001 ./master-bin.000003 ./master-bin.000004 ./master-bin.000005 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Master_Host: localhost Master_User: slave_user Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000005 Read_Master_Log_Pos: 106 Relay_Log_File: dixon-relay-bin.001776 Relay_Log_Pos: 252 Relay_Master_Log_File: master-bin.000005 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 106 Relay_Log_Space: 553 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec)
[9 Nov 2009 22:01]
Adam Dixon
When the situation is reproduced, if stop slave; is issued, simply running start slave; again will continue the looping of relay logs. Reset Master is one way to 'clear' the situation...
[10 Dec 2009 12:10]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/93489 3215 Li-Bing.Song@sun.com 2009-12-10 BUG #28421 Infinite loop on slave relay logs That, manually deleteing one or more entries from 'master-bin.index', will cause master infinitely loop to send one binlog file. When starting a dump session, master opens index file and search the binlog file which is being requested by the slave. The position of the binlog file in the index file is recorded. it will be used to find the next binlog file when current binlog file has been dumped completely. As only the position is used, it will can not get the correct file if some entries has been removed manually from the index file. the master will reopen the current binlog file which has been dump completely and redump it if it can not get the next binlog file's name from index file. It obviously is a logical error. Manually changing index file is a illegal action, so after this patch, master sends a fatal error to slave and close the dump session if a new binlog file has been generated and master can not get it from the index file.
[5 Jan 2010 10:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/95941 3303 Li-Bing.Song@sun.com 2010-01-05 BUG #28421 Infinite loop on slave relay logs Manually deleteing one or more entries from 'master-bin.index', will cause master infinitely loop to send one binlog file. When starting a dump session, master opens index file and search the binlog file which is being requested by the slave. The position of the binlog file in the index file is recorded. it will be used to find the next binlog file when current binlog file has dumped completely. As only the position is used, it may not get the correct file if some entries has been removed manually from the index file. the master will reopen the current binlog file which has been dump completely and redump it if it can not get the next binlog file's name from index file. It obviously is a logical error. Even though it is allowed to manually change index file, but it is not recommended. so after this patch, master sends a fatal error to slave and close the dump session if a new binlog file has been generated and master can not get it from the index file.
[8 Jan 2010 5:45]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/96344 3308 Li-Bing.Song@sun.com 2010-01-08 BUG #28421 Infinite loop on slave relay logs Manually deleteing one or more entries from 'master-bin.index', will cause master infinitely loop to send one binlog file. When starting a dump session, master opens index file and search the binlog file which is being requested by the slave. The position of the binlog file in the index file is recorded. it will be used to find the next binlog file when current binlog file has dumped completely. As only the position is used, it may not get the correct file if some entries has been removed manually from the index file. the master will reopen the current binlog file which has been dump completely and redump it if it can not get the next binlog file's name from index file. It obviously is a logical error. Even though it is allowed to manually change index file, but it is not recommended. so after this patch, master sends a fatal error to slave and close the dump session if a new binlog file has been generated and master can not get it from the index file.
[8 Jan 2010 5:58]
Libing Song
Pushed to mysql-5.1-bugteam and mysql-pe
[15 Jan 2010 8:59]
Bugs System
Pushed into 5.1.43 (revid:joro@sun.com-20100115085139-qkh0i0fpohd9u9p5) (version source revid:li-bing.song@sun.com-20100108054223-p0cnli6sp1npui3k) (merge vers: 5.1.42) (pib:16)
[16 Jan 2010 1:44]
Paul DuBois
Noted in 5.1.43 changelog. Manually removing entries from the binary log index file on a replication master could cause the server to repeatedly send the same binary log file to slaves. Setting report to NDI pending push to 5.5.x+.
[5 Feb 2010 11:47]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100119163614-172adculixyu26j5) (pib:16)
[5 Feb 2010 11:53]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 11:59]
Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alexey.kopytov@sun.com-20100115112653-e3a24041ag1cv6v3) (merge vers: 5.5.1-m2) (pib:16)
[5 Feb 2010 16:43]
Paul DuBois
Noted in 5.5.2, 6.0.14 changelogs. Setting report to Need Merge pending push into Celosia.
[12 Mar 2010 14:06]
Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:23]
Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:36]
Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[12 Mar 2010 17:55]
Paul DuBois
Fixed in earlier 5.1.x, 5.5.x.