MySQL Bugs: #44152: At the end of a large replication MySQL logs server

Bug #44152	At the end of a large replication MySQL logs server_errorno=2013
Submitted:	8 Apr 2009 13:57	Modified:	20 Jul 2012 13:33
Reporter:	Gary Glickman	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	5.0.77	OS:	Windows (Win2003 SP2. 8GB RAM. 60GB HD (30 GB free space))
Assigned to:		CPU Architecture:	Any
Tags:	Lost connection, replication, Server Erorr 2013

Description:
We have a master with 2 slaves.  There are some tables on the master that are synchronized from an external datasource via a java program.  Nightly, a few of these tables are completely emptied via DELETE FROM xyz, and then the program reloads the master tables with individual selects.  The largest table loaded is approxmatly 35k records.  The next largest is 8k records.

Every night since we have setup replication, when the replication completes and the reloaded table counts match the master, the MySQL server throws the following error sequence:

090408  1:11:32 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090408  1:11:32 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 590808177
090408  1:11:32 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 590808177

The replication is successful and the table counts all match, but this error is logged to the .err file and to the Windows Application Event Log.

This is problematic because we have 24/7 monitoring of all our production servers and errors that appear in the Windows Application Log are sent to the 24/7 monitoring team for attention (they page the support team who has to work the problem...etc...).

The other slave (we have two connected to the master) does not have this problem.

However, the slave with the problem is approximately 1,000 miles from the master with replication occuring accross the WAN.  The other slave that has no problem is on the same IP network as the master and sits directly next to the master.

We have a development test environment and can not reproduce the problem there...the dev/test environment consists of two servers that are right next to each other.

For the problem slave we have done the following with no success:

-Upgraded from 5.0.67 to 5.0.77
-Uninstalled/Reinstalled
-Increased the timeouts variables:
| net_read_timeout           | 120   |
| net_write_timeout          | 60    |
| slave_net_timeout          | 3600  |

Replication on this server otherwise is fine.  Small single insert/updates work without issue.  It is only these large table loads that present a problem.

Below is an excerpt from the .err file for the past several days.

Thank you
-----------------------------------------------------------------
090404  0:59:59 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090404  0:59:59 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 261475532
090404  0:59:59 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 261475532
090404  1:04:41 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090404  1:04:41 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 267548150
090404  1:04:41 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 267548150
090405  1:23:49 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090405  1:23:49 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 368569834
090405  1:23:49 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 368569834
090405  3:59:55 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090405  3:59:55 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 382532134
090405  3:59:55 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 382532134
090406  0:58:33 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090406  0:58:33 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 403280241
090406  0:58:34 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 403280241
090406  1:25:38 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090406  1:25:38 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 445038711
090406  1:25:38 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 445038711
090407  1:01:14 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090407  1:01:14 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 490299861
090407  1:01:15 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 490299861
090408  1:11:32 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
090408  1:11:32 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000002' position 590808177
090408  1:11:32 [Note] Slave: connected to master 'repl@entpa51.prod.fedex.com:3306',replication resumed in log 'mysql-bin.000002' at position 590808177

How to repeat:
1) Create a slave that replicates accross a WAN with 6+ hops
2) Create a table on the master and load it with 30k+ records
3) Empty the master table via delete from <table>
4) Reload the master table via individual inserts so as to reload all 30k+ records all in one session.

Thank you for the problem report. Please, send my.ini files from master and problematic slave. Also, send the results of

ping

command from slave to master during the period when this problem happens.

Thank you for the report.

Please also provide master error log file.

Your master has lots of shutdowns. Starting/shutdown master during 2 minutes, 2 times and so on. Is it possible that you got the error messages from slave because somebody shutdown the master?

Really? After over a month of silence that is the response?  Did you look at the logs?  So you suggest that we were able to stop and restart the server all within 1 second?  Do you even compare the times of the shutdown against the error?

Sorry, I know this is free support, but still.

Anyway, if you are interested in making your product better, I am posting the .err files from the master and slave again.  There are a couple of months of data in there now.  You will see that over a month elapsed between restarts of the master and during that time the slave threw the error regularly.

slave.err - 2+ months of data

Attachment: slave.err (text/plain), 111.69 KiB.

Do your 2 slaves have the same server-id?

From http://www.mysqlperformanceblog.com/2008/06/04/confusing-mysql-replication-error-message/

"
After setting up new slave Server I'm getting error log file flooded with messages like this and there is no hint in the message what would explain what is wrong.

In fact the issue in this case is (because of configuration error) two slave servers got the same server-id.

Seriously in this case Master clearly sees the problem in this case as there are 2 servers with same server-id connected and replicating so it should report it to the slave instead of sending end packet.

At very least it would be nice to include possible reason for this error message which MySQL already does in many other cases.

I've now filed it as a bug. http://bugs.mysql.com/bug.php?id=37211

"

Can be same as bug #44430

No. They do not have the same name.

Like the original post states, we replicate 45+k records successfuly.  But during the replication with this one host only, the error appears one or twice.  The other slave never throws an error.  The difference is that the error free slave is next to the master and the slave with errors is 1,000 miles away from the master accross the wan through several hops.

I have already posted the slave config settings.  So please review those before suggesting tomeout settings.

Thanks in advance.

Thank you for the feedback.

> The difference
> is that the error free slave is next to the master and the slave with
> errors is 1,000 miles away from the master accross the wan through
> several hops.

Please check your network in this case. Try to increase timeouts. Most likely not MySQL bug.

> I have already posted the slave config settings.  So please review
those before suggesting tomeout settings.

Yes, we reviewed. There is no magic timeout value, so I'd recommend to verify how your network works.

We are not able to repeat it.

Also it really looks like a network problem and not a MySQL problem.

Duplicate of BUG#53955.