Bug #49566 Connector hangs when server stops responding.
Submitted: 9 Dec 2009 15:04 Modified: 18 Feb 2010 11:09
Reporter: Robert Hood Email Updates:
Status: No Feedback Impact on me:
None 
Category:Connector / NET Severity:S3 (Non-critical)
Version:6.1.3 OS:Windows (XP)
Assigned to: Assigned Account CPU Architecture:Any

[9 Dec 2009 15:04] Robert Hood
Description:
I believe I am dealing with two bugs, one on the server side and one in the connector.  For whatever reason, sometimes the server stops responding part of the way through while returning back a result set that has multiple MB with of data in it.  When this happens the connector locks up the entire application on the thread on which it is running.  It never returns.  I ran the connector from source code, and it hangs while trying to read from the network.  In MySqlStream.cs, line 177 the call is:

int b1 = inStream.ReadByte();

How to repeat:
This is a tough one to replicate.  Only 1 of our servers experiences this issue at a time (we have almost 100 servers with similar data on each one), and the issue appears to come and go.  I would suggest building your own custom server emulater, and have it stop sending data in the middle of a request.

Suggested fix:
Use the stream timeout and if it exceeds the timeout, throw some kind of exception.
[10 Dec 2009 7:11] Tonci Grgin
Hi Robert and thanks for your report.

Yes, I have seen such reports before and mostly it was bad network HW. Isn't it suspicious that only 1 of your servers fails? Can you please check/replace that servers network card/cables and such and see if it still fails.

In the meantime I'll try repeating the problem (although still have no idea how).
[10 Dec 2009 12:52] Robert Hood
Since I submitted the client side bug, we've learned a little bit more about the server side issue that I was saving for the server bug that we submit, but I'll share what we've learned.

Our query is essentially select * from table where field = condition;  When using WireShark, we see that the server stops responding always in the exact same place.  We also do not see the end of file packet, which is what I assume the client is waiting for.

What is more odd is if we modify the query in any way, the server finishes sending all the data.  For example, if instead of requesting *, but instead we list all the fields, the query works.  If we do something as simple as adding an extra space between "select" and "*", the query works.

In our connection string, we specify "character set=utf8".  When we remove this, the query works.  We can even add a copy of the table under a different name, just one letter off the original, copy all the data into that table, and the original query (modified to query the new table) works.

Also, when I said only 1 server has this issue, we've seen it on other servers before, it just so happened to be only 1 at a time, and the issue seems to magically go away after 6-10 hours.  Our current situation is one where the issue did not go away after 6-10 hours.  I'm beginning to wonder if the issue is data related, and it goes away after someone modifies some piece of data.
[11 Dec 2009 8:19] Tonci Grgin
Robert, this is probably not related to connector at all and thus this report makes no sense. All I can think of to help you is to set command timeout thus preventing endless wait.

If you think there is a server issue here, please open new report in proper category.

Can I close the report now?
[11 Dec 2009 14:00] Robert Hood
I do have a command timeout set, but it is ignored.  While I agree that the main problem is on the server, you can run "show full processlist" and the query is not listed.  Clearly, the server has broken connection to the client, but the client is still stuck waiting.  I can also tell you that the DevArt dotConnect client does not suffer from this.  It realizes the server has broken the connection and throws an exception alerting the app as such.
[13 Dec 2009 20:40] Vladislav Vaintroub
network timeouts are correctly handled starting with 6.2, earlier versions are not as good at it.
[18 Jan 2010 11:09] Tonci Grgin
Robert, have you tried newer c/NET versions as Wlad suggested?
[19 Feb 2010 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".