MySQL Bugs: #10157: Errno 2013 kills slave I/O thread and slave SQL thread

Bug #10157	Errno 2013 kills slave I/O thread and slave SQL thread
Submitted:	25 Apr 2005 20:12	Modified:	25 Jun 2005 16:30
Reporter:	Jeremy Jepsen	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S3 (Non-critical)
Version:	Master: 4.0.18; Client: 4.0.24	OS:	Windows (Windows)
Assigned to:		CPU Architecture:	Any

Description:
I checked some of the other bugs and the part that is different with my error is when the slave SQL thread is killed.  I run a continous replication.  The error that is logged is below. 

050422 11:10:36 Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
050422 11:10:36 Slave I/O thread killed while reading event
050422 11:10:36 Slave I/O thread exiting, read up to log 'ILCAC76MCYPHER-bin.003', position 732455519
050422 11:10:36 Error reading relay log event: slave SQL thread was killed
050422 11:10:36 Slave SQL thread initialized, starting replication in log 'ILCAC76MCYPHER-bin.003' at position 742303990, relay log '.\ILCACH13S15BC-relay-bin.001' position: 4
050422 11:10:36 Slave I/O thread: connected to master 'CACHRepDB@10.66.225.100:3306',  replication started in log 'ILCAC76MCYPHER-bin.003' at position 742303990

Any ideas??  Do I need to change my wait timeout vars??  Will flushing the master.info and relay_log.info files help at the end of replication before it starts over again??  

How to repeat:
Cannot supply data at this time.  Let me know if this is essential as I will have to filter out sensitive data.

Hello!
Weird: I/O thread stops at 732455519 and restarts at 742303990 which is 10M greater. If this is an automatic restart, positions should be equal. What happened?

We have recently had some network connectivity issues with this site.  I have a feeling that the network is the culprit.  Any ideas about how to recover the lost replication, other than fix the network as this may take a while?

[User's answer about why the two positions differ]
"Because we replicate the data to several boxes, if the connections fails on a client box, the replication will continue on the master so that the other clients can continue to replicate.  That explains the gap and that is our real problem, not the fact the connection is lost, but that the gap leads to lost data on the clients."

I was unable for to find problems running both master/slave with 4.0.24
on Windows. Of course how I don't have enough info from you (db schema,
queries, etc) my test maybe isn't valid for you case, please free to re-open
this issue or comment if you can provide me a repeatable test case.

Thanks in advance.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".