MySQL Bugs: #19175: Slave can't reconnect to master after network issues.

Bug #19175	Slave can't reconnect to master after network issues.
Submitted:	18 Apr 2006 19:06	Modified:	13 Feb 2008 13:24
Reporter:	Gilberto Müller	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	mysql-pro-5.0.20-linux-i686-glibc23	OS:	Linux (linux debian)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
After upgrading from mysql-pro-4.1.13-pc-linux-gnu-i686 to this current release in my MASTER SERVER, the slave server can't reconnect to master after network issues. In 4.1.13 it worked fine but in this new version it doesn't.

The SHOW SLAVE STATUS command shows: 
Slave_IO_State: Reconnecting after a failed master event read
Slave_IO_Running: No
Slave_SQL_Running: Yes

I don't know how this happened but a simple workaround helped me:
SLAVE STOP;
SLAVE START;

:)

Let me know please if more information are needed.

Best regards,

Gilberto Müller

How to repeat:
Interrupt the connection to the MASTER SERVER and them reconnect it.

Specified server: replication category

Thank you for a problem report. What version is running on your slaves? 5.0.20? Please, send the appropriate part of the slave's error log also.

Thank you by responding that fast!
There are many clients running as slave, but I can tell you for sure versions 4.0.24 and 4.0.25.
I'll take a look if it happens again with more slave versions and then I'll let you know.

#####################################################################################################################################
4.0.24_Debian-10sarge1

060419  2:24:02 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  2:27:06 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  2:27:07 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  2:27:07 [Note] Error reading relay log event: slave SQL thread was killed
060419  2:27:07 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  2:27:07 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000236' position: 143
060419  3:27:10 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  3:30:21 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  3:30:21 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  3:30:21 [Note] Error reading relay log event: slave SQL thread was killed
060419  3:30:21 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  3:30:21 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000237' position: 143
060419  4:30:24 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  4:33:25 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  4:33:25 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  4:33:25 [Note] Error reading relay log event: slave SQL thread was killed
060419  4:33:25 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  4:33:25 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000238' position: 143
060419  5:33:25 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  5:35:48 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  5:35:48 [Note] Slave I/O thread exiting, read up to log 'logreplication.351', position 11045634
060419  5:35:48 [Note] Error reading relay log event: slave SQL thread was killed
060419  5:35:48 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logreplication.351' at position 11045634
060419  5:35:48 [Note] Slave SQL thread initialized, starting replication in log 'logreplication.351' at position 11045634, relay log '/PATH/TO/RELAYLOG/relaylog.000239' position: 143

#####################################################################################################################################
4.0.25-pro

060419  6:09:07 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  6:11:32 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  6:11:33 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  6:11:33 [Note] Error reading relay log event: slave SQL thread was killed
060419  6:11:33 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000304' position: 135
060419  6:11:33 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  7:11:35 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  7:14:23 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  7:14:23 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  7:14:23 [Note] Error reading relay log event: slave SQL thread was killed
060419  7:14:23 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000305' position: 135
060419  7:14:23 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  8:14:26 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  8:16:53 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  8:16:54 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  8:16:54 [Note] Error reading relay log event: slave SQL thread was killed
060419  8:16:54 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915
060419  8:16:54 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000306' position: 135
060419  9:16:57 [ERROR] Slave I/O thread: error reconnecting to master 'USER@HOST:PORT': Error: ''  errno: 0  retry-time: 10  retries: 86400
060419  9:21:01 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
060419  9:21:01 [Note] Slave I/O thread exiting, read up to log 'logbin.013', position 551132915
060419  9:21:01 [Note] Error reading relay log event: slave SQL thread was killed
060419  9:21:01 [Note] Slave SQL thread initialized, starting replication in log 'logbin.013' at position 551132915, relay log '/e-backup/customers/mcarthur/database/logs/relaylog.000307' position: 135
060419  9:21:02 [Note] Slave I/O thread: connected to master 'USER@HOST:PORT',  replication started in log 'logbin.013' at position 551132915

Just correcting the information above.
The server 5.0.20 is the SLAVE, and those other versions are the MASTERS.
The log info showed if from SLAVE.
The 4.0.24 and 4.0.25 are the MASTER.

Please, send my.cnf content from your "normal" 4.0.x slaves and 5.0.20 slave.

Can you check in master binlogs what SQL statements were executed at the time of reconnection errors?

Hi there, sorry for my late reply:
You want me to post here or send to you by other way?

You can upload your my.cnf files here, as private, if you want. Or you want to send binlogs? Then (if they are larger than 200K) you can upload them to  ftp://ftp.mysql.com/pub/mysql/upload (with bug# in a file name). Send a comment here about your uploads, if any.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Hi
I'm having a very similar problem. Using a master 3.23 and slave 5.0.18.
After SLAVE START, slave runs OK for 60 minutes. Then it crashes and I log:
======
070318 22:40:01 [ERROR] Slave I/O thread: error reconnecting to master 'replicator@arrakis.ideeel.nl:3306': Error: ''  errno: 0  retry-time: 5  retries: 86400
070318 22:40:01 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
070318 22:40:01 [Note] Slave I/O thread exiting, read up to log 'arrakis-bin.381', position 203
070318 22:40:01 [Note] Error reading relay log event: slave SQL thread was killed
====
restarting slave solves the problem. This is not related to a cronjob. It does not happen every 60 minutes (but most): I ran a script that checks for this crash, restarts and logs the time:
======
Sun Mar 18 22:40:01 CET 2007
Sun Mar 18 23:40:01 CET 2007
Mon Mar 19 00:45:01 CET 2007
Mon Mar 19 01:50:01 CET 2007
Mon Mar 19 02:55:01 CET 2007
Mon Mar 19 03:55:01 CET 2007
Mon Mar 19 04:55:01 CET 2007
Mon Mar 19 06:00:01 CET 2007
Mon Mar 19 07:00:02 CET 2007
Mon Mar 19 08:05:01 CET 2007
======

Anyone have an idea? continuously restarting the slave is a bit clunky.

thanks
Joris

All reporters:

Please, try to repeat with a newer version of MySQL server, 5.0.36/5.0.37, as a slave at least, and inform about the results.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I am having the same problem. After a network interruption, the slave never reconnects, always says "Reconnecting after a failed master event read" until I manually enter a "stop slave;" then a "start slave;", after that everything runs just fine.

I'm on ubuntu debian/ia64 for the slave, RHEL/ia32 for the master, both sides on 5.0.45 (the RHEL one is downloaded from MySQL, the Debian one is from the Ubuntu apt repositories).

Please tell me exactly which files I can upload to help figure this out. I have a cron job running that sends me mail whenever the replication stops, but it's a pain in the neck to keep doing this.

I think this is related -- http://bugs.mysql.com/bug.php?id=21132

And I think this is a duplicate of 19175 -- http://bugs.mysql.com/?id=30814

I am having the same problem, bad connection caused the Slave thread to
stop, and it needs manual stop-start slave.
Both master and slave runs MySQL 5.0.45 on FreeBSD 6.x

Bug #30814 was marked as a duplicate of this one.

As related bug #21132 is already fixed in 5.0.54, please, try to repeat with 5.0.54 or later version and inform about the results.

Will try with 5.0.54 as soon as it is released - the newest I see on download pages is still 5.0.45, which is what I am running, and where I see this bug.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".