The bug was updated successfully. The following people were notified: the MySQL developers, the bug reporter, the assigned lead, interested observers, and nobody else.
Bug #107507 | client.getSession() hangs forever, pool crash when max con reached, bad perf | ||
---|---|---|---|
Submitted: | 8 Jun 2022 5:24 | Modified: | 4 Aug 2022 14:48 |
Reporter: | Jordan BRETON | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | Connector for Node.js | Severity: | S1 (Critical) |
Version: | 8.0.29 | OS: | Ubuntu (20.04.3) |
Assigned to: | CPU Architecture: | Any (8 cores) | |
Tags: | Crash (non fatal), hanging, max connection, performance issue |
[8 Jun 2022 5:24]
Jordan BRETON
[8 Jun 2022 7:01]
MySQL Verification Team
Hello Jordan BRETON, Thank you for the report and feedback. regards, Umesh
[8 Jun 2022 13:25]
Rui Quelhas
Posted by developer: Two different things here. Performance and how to handle the situation where the maximum number of connections in the server is exceeded. Let's start with the performance concerns. First of all, the X DevAPI connector uses TLS by default, whereas mysql2 does not. If you disable TLS on the X DevAPI connector (or vice-versa), you should see that, actually, the X DevAPI connector performs better in the sequential case. With regards to the parallel case, the linchpin is the fact that you are constantly hitting the pool size limit, which causes new client connection requests to be queued. In the X DevAPI connector, we decided this warrants a rate limit, because probably you want to increase the pool size, instead of having a massive queue of requests to serve (and consequently, avoid unintended denial of service). In this case, what happens is that, once there is space in the pool, the queued connection requests are only retried after a given time as determined by a sliding window. We can argue the sliding window is probably too big or not, but this is by design. Again, you are probably better served by increasing the pool size. In mysql2, there is no such thing, from what I can tell. Other interesting things are happening. The connection pool in the X DevAPI connector not only manages the client side connections, but also the server side connections, by re-using the existing ones by only re-setting the underlying session (keeping users authenticated). On the other hand, mysql2 only manages the pool from the client standpoint, until you exceed the pool size limit, it will always create new connections. This is a lot worse in terms of resource management, and, of course means that in mysql2, you will hit the server side connection limit a lot faster. If you check the connection ids in the server, you will notice that mysql2 creates a lot more server side connections than the X DevAPI connector. With you default test runs, more than 55 with mysql2, and less than 20 with the X DevAPI connector. If you consider this, and discard the rate limit for queued connections, you actually start to see a different picture. In your tests, you will also get a lot better results for the X DevAPI connector, still not as good as mysql2, but that's because we need to keep re-balancing the pool in order to do this kind of management and not simply look at the overall pool size. In any case, these are not bugs or issues, but simply outcomes of particular design decisions. I'm not really sure what can be done here. Maybe we can expose the rate limit as a pool configuration option? But that's more like an enhancement request. Moving on to the maximum number of server connections. The problem here, from what I can tell is when the value of the "maxSize" option in the pool is higher than the one defined for the "mysqlx_max_connections" server variable. In this case, the client is expected to return an error. This seems to happen, as you mention, but I think there is still something fishy going on and the client cannot recover from the fatal error sent by the server (which is also expected) but it seems to leave some unattended resources (sockets). So, I think we can consider this a bug, and I can address it in the scope of this report. However, I don't think it is a Sev2. :) Thank your for your detailed bug report.
[8 Jun 2022 14:04]
Rui Quelhas
Posted by developer: In my last comment I've claimed the following: "First of all, the X DevAPI connector uses TLS by default, whereas mysql2 does not. If you disable TLS on the X DevAPI connector (or vice-versa), you should see that, actually, the X DevAPI connector performs better in the sequential case." I believe I was doing something wrong on my end. This does not seem to be the case. It is a lot better but i still does not perform better than mysql2.
[8 Jun 2022 15:47]
Jordan BRETON
Thanks for your quick and detailed answer. It definitely makes sense, I have now a better overall picture of what happen. Resources management was what made me opt for the XDev connector a while ago. It's why I redesigned my app to be able to easily switch back from mysql2 as soon as this bug is resolved, especially for a TLS scenario. I really love the design and features of this connector. "Maybe we can expose the rate limit as a pool configuration option? But that's more like an enhancement request." => I think this is a good solution to help users fine-tune their configuration. Depending on the application we built, it seems mandatory to me to be able to support heigher load peaks, especially if the MySQL server is powerfull enough to handle it. "The problem here, from what I can tell is when the value of the "maxSize" option in the pool is higher than the one defined for the "mysqlx_max_connections" server variable.". Not exactly. You can crash a pool with maxSize of 20 connections, if you run enough instance of the process. As soon as one of these process try to open a new connection on the MySQL server, and that that server runs out of available connections, the pool of that process will crash. The thing is that if you distribute the load in a round robin fashion accross your processes, you can get to a point where all pools crashes in all processes, especially if the high load peak last for few minutes, since all the connections the crashed pool can't handle anymore will have to be handled by a few subset of the running processes. So even increasing the maxSize of each pool can't solve the problem for now. "However, I don't think it is a Sev2. :)" Feels free downgrade this ticket severiy to the level you think suits better to the bug. I choosed this one reading all definitions. I found out, at the moment, that "Represents a complete loss of service, a significant functionality is missing, a system that hangs indefinitely; and there is no available workaround." suited well, especially for the "system that hangs indefinitely" part, but reading it a second time it seems to speak about an OS level hanging... And since I had to change the driver for my use case, I thought that it was consistent with the "no available workaround" part ^^ Again, thanks for your time, attention and answers.
[28 Jun 2022 17:22]
Rui Quelhas
Posted by developer: Just got a chance to get back at this. "Not exactly. You can crash a pool with maxSize of 20 connections, if you run enough instance of the process." My statement was not very well formulated. If you run enough instances of the process you eventually exceed the maximum number of connections allowed by the server. So, in practice, the "maxSize" value defined for the pool of one of those processes will be higher than the remaining slots allowed by the server. In this case, the server will disconnect any new connection. Which brings us to the next point. "As soon as one of these process try to open a new connection on the MySQL server, and that that server runs out of available connections, the pool of that process will crash." The pool does not crash, the attempt to create a connection fails, but the pool is still working fine. If an active connection is released, you can create connections again. In your example, the corresponding promises executed by "Promise.allSettled()" will be rejected. "The thing is that if you distribute the load in a round robin fashion accross your processes, you can get to a point where all pools crashes in all processes, especially if the high load peak last for few minutes, since all the connections the crashed pool can't handle anymore will have to be handled by a few subset of the running processes. So even increasing the maxSize of each pool can't solve the problem for now." Again, the pool does not crash. Increasing the "maxSize" in that case is pointless unless you also increase the value of "mysqlx_max_connections" accordingly. In the end, apart from the error message, which does not provide to much context (ECONNRESET basically), I don't see exactly what can be improved here. It is mostly a problem of mismatched expectations. You were expecting the connection requests to somewhat be queued in that scenario, at least from what I can tell. That is also not a good option, because, in the end, you could end up retrying connections indefinitely, and all of those retries would always have to reach the server (not good). You are probably better off monitoring rejected connections and adjust both the client and server configuration accordingly. With regards to performance, comparing an X Protocol implementation like Connector/Node.js with a classic protocol implementation like node-mysql2 is comparing apples to oranges, particularly for this specific type of benchmarks focused mostly on connection time. I mentioned TLS before, but there are other things such as connection attributes and enhanced authentication negotiation which contribute to worse performance results but provide other kind of benefits. Some kind of refresh rate for the connection pool, which can be configured by the application to determine the waiting time between retries of queued connection requests is not a bad idea for an enhancement. I would suggest you open an enhancement request for that, as a first step in evaluating such a feature. In the meantime, unless you have something else you wish to add to the discussion. I will close this specific report. Thank you very much for your feedback and thank you for using MySQL.
[5 Aug 2022 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".