Bug #86056 MySqlException (0x80004005): Reading from the stream has failed.
Submitted: 24 Apr 2017 14:32 Modified: 21 Jul 2017 11:24
Reporter: Natan Vivo Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S1 (Critical)
Version:6.9.9 OS:Microsoft Windows
Assigned to: CPU Architecture:Any
Tags: bug, exception, stream

[24 Apr 2017 14:32] Natan Vivo
Description:
When the server is under high load, it will start to throw EndOfStreamExceptions when trying to open connections. Bug https://bugs.mysql.com/bug.php?id=76597 reported this 3 years ago and has been marked as Can't repeat, probably because this cannot be repeated under normal conditions.

The exception thrown is:

MySql.Data.MySqlClient.MySqlException (0x80004005): Reading from the stream has failed. ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
   at MySql.Data.MySqlClient.MySqlStream.ReadFully(Stream stream, Byte[] buffer, Int32 offset, Int32 count)
   at MySql.Data.MySqlClient.MySqlStream.LoadPacket()
   at MySql.Data.MySqlClient.MySqlStream.LoadPacket()
   at MySql.Data.MySqlClient.MySqlStream.ReadPacket()
   at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.ReadPacket()
   at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.AuthenticationFailed(Exception ex)
   at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.ReadPacket()
   at MySql.Data.MySqlClient.Authentication.MySqlAuthenticationPlugin.Authenticate(Boolean reset)
   at MySql.Data.MySqlClient.NativeDriver.Open()
   at MySql.Data.MySqlClient.Driver.Open()
   at MySql.Data.MySqlClient.Driver.Create(MySqlConnectionStringBuilder settings)
   at MySql.Data.MySqlClient.MySqlPool.CreateNewPooledConnection()
   at MySql.Data.MySqlClient.MySqlPool.GetPooledConnection()
   at MySql.Data.MySqlClient.MySqlPool.TryToGetDriver()
   at MySql.Data.MySqlClient.MySqlPool.GetConnection()
   at MySql.Data.MySqlClient.MySqlConnection.Open()

I'm having this errors in tens of thousands per hour on AWS at some points of the day. mostly on weekends.  After some investigation, I also believe this bug is time related. When the server is under heavy load or any other reason that can cause the process to be slower than normal (cloud environments are very susceptible to this due to multi-tenancy), and this error will popup, and very frequently.

It seems to me the issue in MySqlStream (https://github.com/mysql/mysql-connector-net/blob/6.9/Source/MySql.Data/MySqlStream.cs#L17...):

    internal static void ReadFully(Stream stream, byte[] buffer, int offset, int count)
    {
      int numRead = 0;
      int numToRead = count;
      while (numToRead > 0)
      {
        int read = stream.Read(buffer, offset + numRead, numToRead);
        if (read == 0)
        {
          throw new EndOfStreamException();
        }
        numRead += read;
        numToRead -= read;
      }
    }

ReadFully takes a naive approach on what it means to read the stream to the end. If the buffer is not filled when the read happens, it assumes it reached the end of the stream, while the process is simply slow and the stream has not received data yet.

How to repeat:
In order to reporduce this, you need to introduce latency to the running process itself, not to the network. Trying to open many connections on a small (1Gb /1CPU) AWS VM over a long period (multiple days) will probably show this behavior.

It seems to me the issue is caused by the network buffers not being filled yet when the driver tries to read it.

Suggested fix:
It would be more correct to read until the procotol has all the data to build an entire command or a timeout has been reached.
[28 Apr 2017 0:41] Bradley Grainger
This code doesn't look incorrect to me (on its face).

From the Stream.Read docs (https://msdn.microsoft.com/en-us/library/system.io.stream.read(v=vs.110).aspx):

> Return Value
> ... zero (0) if the end of the stream has been reached.
>
> The implementation will block until at least one byte of data can be read, in the event that no data is available. Read returns 0 only when there is no more data in the stream and no more is expected (such as a closed socket or end of file). An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.
[28 Apr 2017 19:26] Natan Vivo
Good call Bradley Grainger.

Still, the problem happens specifically when the server is slower than normal without any changes to the code. Since I use the same image for multiple servers, I can see the problem appear and disappear over time without any change to the server.

I'm not sure if your driver will show the same problem, but I intend to find out as soon as we get rid of EF.
[4 May 2017 22:24] Bradley Grainger
Natan Vivo,

We were infrequently (but consistently) getting this exception in our app pools when using Connector/NET. Our projects that have switched to https://github.com/mysql-net/MySqlConnector are no longer seeing the issue. I'm keen to hear your experience of switching.
[30 May 2017 13:13] Chiranjeevi Battula
Hello Natan Vivo,

Thank you for the bug report.
Could you please provide repeatable test case (exact steps/sample project,my.cnf/ini, create tables statements/database etc. - please make it as private if you prefer) to confirm this issue at our end?

Thanks,
Chiranjeevi.
[1 Jul 2017 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[21 Jul 2017 9:37] Justin Farrugia
We have been able to replicate this error by blocking the eMule torrent program on the SonicWall firewall P2P by application control. Any help regarding this?
[21 Jul 2017 11:24] Natan Vivo
Hi, sorry for the lack of feedback.

After a long time, I finally found out this was caused by AWS throttling disk IO on RDS. Once IOPS limit was reached, disk latency increased and processes started to accumulate on the server, causing mysql to stop responding and these weird errors to appear.

This is very specific to how AWS EBS works. The solution was to increase disk size in order to increase IOPS on EBS, then these problems disappeared.