Bug #19175 Slave can't reconnect to master after network issues.
Submitted: 18 Apr 2006 19:06 Modified: 13 Feb 2008 13:24
Reporter: Gilberto Müller Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:mysql-pro-5.0.20-linux-i686-glibc23 OS:Linux (linux debian)
Assigned to: Assigned Account CPU Architecture:Any

[18 Apr 2006 19:06] Gilberto Müller
Description:
After upgrading from mysql-pro-4.1.13-pc-linux-gnu-i686 to this current release in my MASTER SERVER, the slave server can't reconnect to master after network issues. In 4.1.13 it worked fine but in this new version it doesn't.

The SHOW SLAVE STATUS command shows: 
Slave_IO_State: Reconnecting after a failed master event read
Slave_IO_Running: No
Slave_SQL_Running: Yes

I don't know how this happened but a simple workaround helped me:
SLAVE STOP;
SLAVE START;

:)

Let me know please if more information are needed.

Best regards,

Gilberto Müller

How to repeat:
Interrupt the connection to the MASTER SERVER and them reconnect it.
[18 Apr 2006 19:25] Gilberto Müller
Specified server: replication category
[19 Apr 2006 8:45] Valeriy Kravchuk
Thank you for a problem report. What version is running on your slaves? 5.0.20? Please, send the appropriate part of the slave's error log also.
[19 Apr 2006 12:40] Gilberto Müller
Thank you by responding that fast!
There are many clients running as slave, but I can tell you for sure versions 4.0.24 and 4.0.25.
I'll take a look if it happens again with more slave versions and then I'll let you know.

#####################################################################################################################################
4.0.24_Debian-10sarge1

060419  2:24:02 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  2:27:06 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  2:27:07 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  2:27:07 [Note] Error reading relay log event: slave SQL thread was killed
060419  2:27:07 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  2:27:07 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000236' position: 143
060419  3:27:10 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  3:30:21 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  3:30:21 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  3:30:21 [Note] Error reading relay log event: slave SQL thread was killed
060419  3:30:21 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  3:30:21 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000237' position: 143
060419  4:30:24 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  4:33:25 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  4:33:25 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  4:33:25 [Note] Error reading relay log event: slave SQL thread was killed
060419  4:33:25 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  4:33:25 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000238' position: 143
060419  5:33:25 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  5:35:48 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  5:35:48 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  5:35:48 [Note] Error reading relay log event: slave SQL thread was killed
060419  5:35:48 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  5:35:48 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000239' position: 143

#####################################################################################################################################
4.0.25-pro

060419  6:09:07 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  6:11:32 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  6:11:33 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  6:11:33 [Note] Error reading relay log event: slave SQL thread was killed
060419  6:11:33 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000304' position: 135
060419  6:11:33 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  7:11:35 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  7:14:23 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  7:14:23 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  7:14:23 [Note] Error reading relay log event: slave SQL thread was killed
060419  7:14:23 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000305' position: 135
060419  7:14:23 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  8:14:26 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  8:16:53 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  8:16:54 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  8:16:54 [Note] Error reading relay log event: slave SQL thread was killed
060419  8:16:54 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  8:16:54 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000306' position: 135
060419  9:16:57 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  9:21:01 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  9:21:01 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  9:21:01 [Note] Error reading relay log event: slave SQL thread was killed
060419  9:21:01 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000307' position: 135
060419  9:21:02 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
[19 Apr 2006 13:29] Gilberto Müller
Just correcting the information above.
The server 5.0.20 is the SLAVE, and those other versions are the MASTERS.
The log info showed if from SLAVE.
The 4.0.24 and 4.0.25 are the MASTER.
[28 Apr 2006 18:24] Valeriy Kravchuk
Please, send my.cnf content from your "normal" 4.0.x slaves and 5.0.20 slave.

Can you check in master binlogs what SQL statements were executed at the time of reconnection errors?
[19 May 2006 12:50] Gilberto Müller
Hi there, sorry for my late reply:
You want me to post here or send to you by other way?
[19 May 2006 18:31] Valeriy Kravchuk
You can upload your my.cnf files here, as private, if you want. Or you want to send binlogs? Then (if they are larger than 200K) you can upload them to  ftp://ftp.mysql.com/pub/mysql/upload (with bug# in a file name). Send a comment here about your uploads, if any.
[19 Jun 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[19 Mar 2007 7:24] NOT_FOUND NOT_FOUND
Hi
I'm having a very similar problem. Using a master 3.23 and slave 5.0.18.
After SLAVE START, slave runs OK for 60 minutes. Then it crashes and I log:
======
070318 22:40:01 [ERROR] Slave I/O thread: error reconnecting to master 'replicator@arrakis.ideeel.nl:3306': Error: ''  errno: 0  retry-time: 5  retries: 86400
070318 22:40:01 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
070318 22:40:01 [Note] Slave I/O thread exiting, read up to log 'arrakis-bin.381', position 203
070318 22:40:01 [Note] Error reading relay log event: slave SQL thread was killed
====
restarting slave solves the problem. This is not related to a cronjob. It does not happen every 60 minutes (but most): I ran a script that checks for this crash, restarts and logs the time:
======
Sun Mar 18 22:40:01 CET 2007
Sun Mar 18 23:40:01 CET 2007
Mon Mar 19 00:45:01 CET 2007
Mon Mar 19 01:50:01 CET 2007
Mon Mar 19 02:55:01 CET 2007
Mon Mar 19 03:55:01 CET 2007
Mon Mar 19 04:55:01 CET 2007
Mon Mar 19 06:00:01 CET 2007
Mon Mar 19 07:00:02 CET 2007
Mon Mar 19 08:05:01 CET 2007
======

Anyone have an idea? continuously restarting the slave is a bit clunky.

thanks
Joris
[19 Mar 2007 11:28] Valeriy Kravchuk
All reporters:

Please, try to repeat with a newer version of MySQL server, 5.0.36/5.0.37, as a slave at least, and inform about the results.
[19 Apr 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[20 Dec 2007 7:00] William Shubert
I am having the same problem. After a network interruption, the slave never reconnects, always says "Reconnecting after a failed master event read" until I manually enter a "stop slave;" then a "start slave;", after that everything runs just fine.

I'm on ubuntu debian/ia64 for the slave, RHEL/ia32 for the master, both sides on 5.0.45 (the RHEL one is downloaded from MySQL, the Debian one is from the Ubuntu apt repositories).

Please tell me exactly which files I can upload to help figure this out. I have a cron job running that sends me mail whenever the replication stops, but it's a pain in the neck to keep doing this.
[6 Jan 2008 23:05] Mark Callaghan
I think this is related -- http://bugs.mysql.com/bug.php?id=21132

And I think this is a duplicate of 19175 -- http://bugs.mysql.com/?id=30814
[7 Jan 2008 5:24] Rayed Alrashed
I am having the same problem, bad connection caused the Slave thread to
stop, and it needs manual stop-start slave.
Both master and slave runs MySQL 5.0.45 on FreeBSD 6.x
[13 Jan 2008 13:24] Valeriy Kravchuk
Bug #30814 was marked as a duplicate of this one.

As related bug #21132 is already fixed in 5.0.54, please, try to repeat with 5.0.54 or later version and inform about the results.
[17 Jan 2008 7:37] William Shubert
Will try with 5.0.54 as soon as it is released - the newest I see on download pages is still 5.0.45, which is what I am running, and where I see this bug.
[14 Feb 2008 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".