Bug #36194 Eliminate pinging with connection pooling to get massive performance improvement
Submitted: 18 Apr 2008 1:15 Modified: 21 Jan 2014 23:08
Reporter: Maxim Mass Email Updates:
Status: Closed Impact on me:
None 
Category:Connector / NET Severity:S5 (Performance)
Version: OS:Any
Assigned to: CPU Architecture:Any

[18 Apr 2008 1:15] Maxim Mass
Description:
When connection pooling is enabled, the connector performs a full round-trip ping to the server for every request. This is done to ensure that the server is still running and that the checked out driver is still able to communicate with the server. Unfortunately this ping often times doubles the total query execution time  for simple or query-cached queries.

How to repeat:
A simple app that hits a database in a loop with connection pooling enabled.
Ensure that mysql.data.dll is compiled with symbols and attach a profiler. You'll see that pinging takes significant % of the query time.

Suggested fix:
Remove pinging and instead handle fatal socket exceptions in pooled connections by removing the connection from the pool and retrying on a new connection. If the socket exception still happens then bubble it out to the client.

I will attach a patch that we've been using successfully with MindTouch Deki Wiki for the past 2 months shortly.
[18 Apr 2008 18:24] Maxim Mass
Patch for 5.1.4 to safely remove pinging

Attachment: mysql.data.dll 5.14 no pinging with connection pool.txt (text/plain), 6.11 KiB.

[18 Apr 2008 18:30] Maxim Mass
Please note that the attached patch may depend on bug #34001. There may be a cleaner way to accomplish this but this approach has worked for us and our users without issue. You can grab our connector with this and other patches from https://svn.mindtouch.com/source/public/dekiwiki/trunk/src/redist/MySql.Data.dll to compare performance and stability.

Thank you,
Max
[21 Apr 2008 8:27] Tonci Grgin
Hi Maxim and thanks for (both) report(s).

I can verify them with no problem but:
> Remove pinging and instead handle fatal socket exceptions in pooled connections by removing the connection from the pool and retrying on a new connection. If the socket exception still happens then bubble it out to the client.

AFAIK, socket exception will take some time to raise and we might hit some other problems doing things this way. What if load-balancing comes in picture? What if you have mess in host table and so on... This is just from the top of my head...

Let's see what Reggie has to say about this problem.
[21 Apr 2008 19:29] Maxim Mass
Tonci, thanks for the response and I do look forward to hearing Reggie's thoughts on this as well. I do understand your concerns as this is a relatively fundamental change but I do believe the performance gains warrant seeing this through to a working solution. 

I am fairly confident that removing this ping and replacing it with better error handling will not impact how people use connection pooling as the behavior and intent is still the same. Although I haven't tried the scenario, I don't believe that setups with a load balancer between the connector and the server would be affected. The connector is behaving the same way except that instead of trying its pinging to see if a connection from the pool is valid, it assumes it's valid. AFAIK, the ping process has the same potential long timeout and other misc socket issues as instead just trying to send the query in the first place.

Microsoft's SQL server connector seems to behave this way as well but they took the additional step of doing background health checking of their pool.

From: http://msdn2.microsoft.com/en-us/library/8xx3tyca.aspx
"If a connection exists to a server that has disappeared, this connection can be drawn from the pool even if the connection pooler has not detected the severed connection and marked it as invalid. This is the case because the overhead of checking that the connection is still valid would eliminate the benefits of having a pooler by causing another round trip to the server to occur. When this occurs, the first attempt to use the connection will detect that the connection has been severed, and an exception is thrown."

Again, looking forward to hearing your thoughts on this.
[21 Apr 2008 20:09] Reggie Burnett
Maxim

Thanks for the patch!  I had already planned to do something very similar for the 5.2.2 beta.  Having your patch just makes it easier!
[21 Apr 2008 20:59] Reggie Burnett
It's not quite as simple as this patch.  There are some scenarios that must be considered such as the following:

SetDatabase does not use the ExecuteXXX methods and is the first query called on a newly returned pooled connection so if the pooler returns a busted connection this will bomb.

Second, it's important that a connection that is inside a current transaction not silently continue with a re-established connection because the user will think that nothing happened when in fact their transaction is now gone.

The better solution is to identify the few points inside NativeDriver.cs where a broken connection can be safely re-established and do that.  THis will work for pooled and non-pooled connections (which is a plus).  This change is going into 5.2.2 beta
[22 Apr 2008 8:10] Tonci Grgin
Reggie, "To be fixed later" is ok?
[11 Jun 2008 2:33] Poul Bak
Most public avaiable webhosts has limited or even disabled 'Ping from Wan'. This includes MySql servers.
Mine has *SIGH*. Disabling pooling helped, but an alternative to ping would be better.
[17 Jun 2008 21:03] Poul Bak
Ok, I have solved my problems, they were actually not related to ping!
[13 Nov 2008 1:00] d di
Comments from a Connector/NET user:
 - option to allow disable of ping() on pool fetch: sounds good (we do our own ping anyway)
 - crashing later like Microsoft does: sounds good (we need to handle crashes anyway)
 - automatically issuing queries on another connection if the current one fails: GOD NO! PLEASE DON'T BREAK STUFF LIKE THAT.

I'm not going to go into all the issues with that last point.  I'll just mention that enterprise applications that are dependent on a specific response time (including connection and command timeouts) are going to break horribly.

Okay, just one more, anything that depends on temporary tables, user variables, or whatever state will break too.

Anyway, it's something you can do in your own connector class that inherits from the main Connector/NET, for your own purposes.  You can publish your wrapper class too if you'd like.  But it's very much a feature you do not want in a stable consumer product.

.2€
[15 Nov 2008 21:02] d di
More issues with automatic re-issue on different connection:
 - transactions break when part of a transaction is re-executed on a new, auto-commit connection.
 - timeouts are currently implemented client side, meaning UPDATEs, INSERTs etc would often be executed twice or more.

Anyway, enough about that.  The problems mentioned by Maxim are real enough.  Time to be constructive for a brief moment :-).

I've tried to incorporate solutions for all of these problems in a new piece of connection pooling logic.  Shameless plug: can be found attached to issue #40684.
[17 Nov 2008 23:29] Maxim Mass
Hey David, I'm glad there's renewed interest in this issue. I haven't put any time or thought into it since filing this bug 6 months ago. We've been running and distributing a custom build of the connector with our software (open source MindTouch Deki) with this and other patches and haven't had any problems reported with this approach of removing pinging and handling errors by retrying on a new connection. I can see how this may not be a full solution to the problem but it addressed our needs. We don't use transactions or assume that multiple queries must be executed using the same connection driver. 
This patch likely didn't address all the issues relating to retrying a query on a new connection but it did further abstract the client code from the connection pooling implementation. It looks like you did a great job of capturing all the connection pooling issues in bug #40684 and I'd be very happy to see it integrated in. Once I get some time I'll take a closer look at it and provide some feedback.
[3 Apr 2013 8:48] Josep Planells
Any progress with this? It would be, as the title states, a "massive performance improvement", at least in our use case.