Bug #106368 | Connection pool cleanup is effectively disabled, but so easy to fix TG! | ||
---|---|---|---|
Submitted: | 3 Feb 2022 12:37 | Modified: | 9 Jun 2022 21:24 |
Reporter: | Karl Alexander | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | Connector / NET | Severity: | S3 (Non-critical) |
Version: | Latest | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[3 Feb 2022 12:37]
Karl Alexander
[3 Feb 2022 12:46]
Karl Alexander
I have attached 4 images that show the DIFF, hope we can get this fix in. Would be great.
[3 Feb 2022 13:47]
Bradley Grainger
See #66476 for a different version of this suggestion that proposed using Stack<T>.
[3 Feb 2022 17:15]
Karl Alexander
Hi Bradley, Thanks for your comment. If you look closely, a stack is not a good choice for the data structure. The stack *behavior* is what you want. If you use a stack, your idle connection cleanup routine will need to allocate more memory in order to reverse it effectively for aging out. It's better to keep memory allocations at a minimal for this routine esp since it holds locks. The LinkedList<T> is the best choice as it allows performant head and tail end operations. Do you think anyone will fix it? I've attached 4 images showing the exact diff's + I've updated the pool class and attached it as a private remark.
[3 Feb 2022 17:19]
Karl Alexander
If you look at #66476 => the cleanup routines, it's clearing down the entire stack, then populating the entire list of old connections...only to reverse that in a loop. That's way too many unnecessary memory allocations, and also it's coded like that to make up for the fact that the data structure (stack) isn't a good fit for LIFO + Aging out. I defo would not recommend that for the solution. Double so if you take into consideration that when the cleanup routine starts, 9/10 it's not aging out connections, meaning doing a stack.Clear() +> List.AddRange => as they do in #66476 is not a good move at all. My linked list structure will make substantially less memory allocations which is important for code like this. If you have time please check carefully the allocations in the Stack<T> vs the LinkedList<T> version i've attached. and let me know your thoughts. Thank you!
[3 Feb 2022 17:27]
Karl Alexander
Just for quick reference, here is where the LinkList<T> is important vs Stack<T> in the cleanup routine. // Good, minimal memory allocations + coded for main flow (no aging out). internal List<Driver> RemoveOldIdleConnections() { var connectionsToClose = new List<Driver>(); var now = DateTime.Now; lock ((_idlePool as ICollection).SyncRoot) { while (_idlePool.Count > _minSize) { var idleConnection = _idlePool.First.Value; var expirationTime = idleConnection.IdleSince.Add( new TimeSpan(0, 0, MySqlPoolManager.maxConnectionIdleTime)); if (expirationTime.CompareTo(now) < 0) { connectionsToClose.Add(idleConnection); _idlePool.RemoveFirst(); } else { break; } } } return connectionsToClose; } // Stack version from #66476, good effort, but wayyy too many memory allocations. internal List<Driver> RemoveOldIdleConnections() { DateTime now = DateTime.Now; lock ((idlePool as ICollection).SyncRoot) { // Rebuild the Stack with only newly used drivers, // and return old idle drivers List<Driver> oldDrivers = new List<Driver>(idlePool.ToArray()); idlePool.Clear(); // Newest driver at index 0, iterate in reverse order for (int i = oldDrivers.Count - 1; i >= 0; i--) { Driver d = oldDrivers[i]; DateTime expirationTime = d.IdleSince.Add( TimeSpan.FromSeconds(MySqlPoolManager.maxConnectionIdleTime)); if ((i < minSize) || (expirationTime > now)) { idlePool.Push(d); oldDrivers.Remove(d); } } return oldDrivers; } } Hope someone can put this LinkList fix into the main code, it's so quick/clean and then we finally have real connection cleaning up happening!
[3 Feb 2022 17:31]
Karl Alexander
The fixed MySqlPool using LinkedList<T> original code from version 8.0.18!
Attachment: MySqlPool.cs (text/plain), 11.23 KiB.
[3 Feb 2022 17:36]
Karl Alexander
But I must admit, thanks to Poul Bak (#66476) for making me realize, I didn't need to put up with the bug, and inspiring me to just go clone and fix the connector code. It's so nice to see our connections finally clearing promptly up after the typical surges. Poul if you are there, check out the linked-list version cleanup routine. it's a big win there.
[4 Feb 2022 5:33]
Karl Alexander
The alternative lib mysqlconnector has major caveats around sync consumption (our existing services). And rightly so, it locked up our load tests when we tried migrating to it. The following extract from their site explains it, but the workaround is not acceptable. So for this platform, we need to stick with mysql.data. https://mysqlconnector.net/tutorials/best-practices/ "Using Synchronous Methods can have adverse effects on the managed thread pool and cause slowdowns or lock-ups if not properly tuned. The recommended approach is to use all Asynchronous Methods." "If you must use synchronous methods, ensure that your thread pool is at least the size of the number of concurrent connections you plan to support. For example, if you are creating a web server using synchronous methods that needs to support serving 500 Requests Per Second, set the minimum thread pool size to 500." I hope someone can make the quick change to mysql.data so we can continue the upgrade path without needing to fork and fix. Thank you.
[9 Feb 2022 7:24]
Karl Alexander
Production DB, connections open before and after fix
Attachment: ConnectionsProduction.PNG (image/png, text), 34.88 KiB.
[4 Mar 2022 14:40]
Mattias Jansson
Just wanted to chime in and say I too would like to see this resolved! Great work to both Poul Bak (#66476) and Karl Alexander for reporting this, it's a shame it hasn't already been resolved. I "found", fixed and tested this bug myself, and was just about to report it when I realized I should look through existing open bugs, and found two :-P I agree a LinkedList<T> is better than Stack<T>, but any one solution would be "better" than the current Queue in my opinion, as they both actually clean idle connections. The current cleanup routine _may_ do some cleanup, but I agree it's mostly useless. I came to the following simple conclusion regarding connection "retention" in "idle" mode, with the current, not-so-good Queue-implementation: After a burst/spike/load, when returning to "idle", the number of retained connections will be _at most_: i*(180/r) where i=Number of connections used per "request" or "cycle" r=Rate of "requests" (or loop iteration cycle time, etc.) Example: If a request uses 2 connections, and a request is received every 10s, the number of retained connections will be at most 2*(180/10)=36 If a burst resulted in more than 36 connections, it would be cleaned down to 36 when idling, but if there's less in the pool they will all be retained. And in this example, 2 connections would be sufficient when idling, or, if the 2 connections are used sequentially, they could be served one single underlying connection in the pool. An additional 34-35 connections could be cleared... To build upon Karls example, the 1 second loop with 1 connection would retain at most 1*(180/1)=180 If the loop generating "burst" connections created >180 (instead of 20), they would eventually be cleaned down to 180. Conclusion: It may do _some_ cleanup, but it's definitely mostly useless... Please fix it!
[8 Mar 2022 14:31]
MySQL Verification Team
Hello, Thank you for the bug report and feedback. Discussed this issue internally with the Connector/NET developer and confirmed that they have a plan to fix this issue in the next release. Thank you! Regards, Ashwini Patil
[9 Jun 2022 21:24]
Christine Cole
Posted by developer: Fixed as of the upcoming MySQL Connector/NET 8.0.30 release, and here's the proposed changelog entry from the documentation team: The mechanism used to clean up idle connections in the connection pool performed poorly. This fix changes the idle list from type Queue<T> to type LinkedList<T> to reduce the overall number of idle connections remaining in the pool after a connection surge is over and the load requirement is reduced. Thank you for the bug report.