MySQL Bugs: #36194: Eliminate pinging with connection pooling to get massive performance improvement

Bug #36194	Eliminate pinging with connection pooling to get massive performance improvement
Submitted:	18 Apr 2008 1:15	Modified:	21 Jan 2014 23:08
Reporter:	Maxim Mass	Email Updates:
Status:	Closed	Impact on me:	None
Category:	Connector / NET	Severity:	S5 (Performance)
Version:		OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
When connection pooling is enabled, the connector performs a full round-trip ping to the server for every request. This is done to ensure that the server is still running and that the checked out driver is still able to communicate with the server. Unfortunately this ping often times doubles the total query execution time  for simple or query-cached queries.

How to repeat:
A simple app that hits a database in a loop with connection pooling enabled.
Ensure that mysql.data.dll is compiled with symbols and attach a profiler. You'll see that pinging takes significant % of the query time.

Suggested fix:
Remove pinging and instead handle fatal socket exceptions in pooled connections by removing the connection from the pool and retrying on a new connection. If the socket exception still happens then bubble it out to the client.

I will attach a patch that we've been using successfully with MindTouch Deki Wiki for the past 2 months shortly.

Patch for 5.1.4 to safely remove pinging

Attachment: mysql.data.dll 5.14 no pinging with connection pool.txt (text/plain), 6.11 KiB.

Please note that the attached patch may depend on bug #34001. There may be a cleaner way to accomplish this but this approach has worked for us and our users without issue. You can grab our connector with this and other patches from https://svn.mindtouch.com/source/public/dekiwiki/trunk/src/redist/MySql.Data.dll to compare performance and stability.

Thank you,
Max

Hi Maxim and thanks for (both) report(s).

I can verify them with no problem but:
> Remove pinging and instead handle fatal socket exceptions in pooled connections by removing the connection from the pool and retrying on a new connection. If the socket exception still happens then bubble it out to the client.

AFAIK, socket exception will take some time to raise and we might hit some other problems doing things this way. What if load-balancing comes in picture? What if you have mess in host table and so on... This is just from the top of my head...

Let's see what Reggie has to say about this problem.

Tonci, thanks for the response and I do look forward to hearing Reggie's thoughts on this as well. I do understand your concerns as this is a relatively fundamental change but I do believe the performance gains warrant seeing this through to a working solution. 

I am fairly confident that removing this ping and replacing it with better error handling will not impact how people use connection pooling as the behavior and intent is still the same. Although I haven't tried the scenario, I don't believe that setups with a load balancer between the connector and the server would be affected. The connector is behaving the same way except that instead of trying its pinging to see if a connection from the pool is valid, it assumes it's valid. AFAIK, the ping process has the same potential long timeout and other misc socket issues as instead just trying to send the query in the first place.

Microsoft's SQL server connector seems to behave this way as well but they took the additional step of doing background health checking of their pool.

From: http://msdn2.microsoft.com/en-us/library/8xx3tyca.aspx
"If a connection exists to a server that has disappeared, this connection can be drawn from the pool even if the connection pooler has not detected the severed connection and marked it as invalid. This is the case because the overhead of checking that the connection is still valid would eliminate the benefits of having a pooler by causing another round trip to the server to occur. When this occurs, the first attempt to use the connection will detect that the connection has been severed, and an exception is thrown."

Again, looking forward to hearing your thoughts on this.

Maxim

Thanks for the patch!  I had already planned to do something very similar for the 5.2.2 beta.  Having your patch just makes it easier!

It's not quite as simple as this patch.  There are some scenarios that must be considered such as the following:

SetDatabase does not use the ExecuteXXX methods and is the first query called on a newly returned pooled connection so if the pooler returns a busted connection this will bomb.

Second, it's important that a connection that is inside a current transaction not silently continue with a re-established connection because the user will think that nothing happened when in fact their transaction is now gone.

The better solution is to identify the few points inside NativeDriver.cs where a broken connection can be safely re-established and do that.  THis will work for pooled and non-pooled connections (which is a plus).  This change is going into 5.2.2 beta

Reggie, "To be fixed later" is ok?

Most public avaiable webhosts has limited or even disabled 'Ping from Wan'. This includes MySql servers.
Mine has *SIGH*. Disabling pooling helped, but an alternative to ping would be better.

Ok, I have solved my problems, they were actually not related to ping!

Comments from a Connector/NET user:
 - option to allow disable of ping() on pool fetch: sounds good (we do our own ping anyway)
 - crashing later like Microsoft does: sounds good (we need to handle crashes anyway)
 - automatically issuing queries on another connection if the current one fails: GOD NO! PLEASE DON'T BREAK STUFF LIKE THAT.

I'm not going to go into all the issues with that last point.  I'll just mention that enterprise applications that are dependent on a specific response time (including connection and command timeouts) are going to break horribly.

Okay, just one more, anything that depends on temporary tables, user variables, or whatever state will break too.

Anyway, it's something you can do in your own connector class that inherits from the main Connector/NET, for your own purposes.  You can publish your wrapper class too if you'd like.  But it's very much a feature you do not want in a stable consumer product.

.2€

More issues with automatic re-issue on different connection:
 - transactions break when part of a transaction is re-executed on a new, auto-commit connection.
 - timeouts are currently implemented client side, meaning UPDATEs, INSERTs etc would often be executed twice or more.

Anyway, enough about that.  The problems mentioned by Maxim are real enough.  Time to be constructive for a brief moment :-).

I've tried to incorporate solutions for all of these problems in a new piece of connection pooling logic.  Shameless plug: can be found attached to issue #40684.

Hey David, I'm glad there's renewed interest in this issue. I haven't put any time or thought into it since filing this bug 6 months ago. We've been running and distributing a custom build of the connector with our software (open source MindTouch Deki) with this and other patches and haven't had any problems reported with this approach of removing pinging and handling errors by retrying on a new connection. I can see how this may not be a full solution to the problem but it addressed our needs. We don't use transactions or assume that multiple queries must be executed using the same connection driver. 
This patch likely didn't address all the issues relating to retrying a query on a new connection but it did further abstract the client code from the connection pooling implementation. It looks like you did a great job of capturing all the connection pooling issues in bug #40684 and I'd be very happy to see it integrated in. Once I get some time I'll take a closer look at it and provide some feedback.

Any progress with this? It would be, as the title states, a "massive performance improvement", at least in our use case.