The bug was updated successfully. The following people were notified: the MySQL developers, the bug reporter, the assigned lead, interested observers, and nobody else.
Bug #107507 client.getSession() hangs forever, pool crash when max con reached, bad perf
Submitted: 8 Jun 2022 5:24 Modified: 4 Aug 2022 14:48
Reporter: Jordan BRETON Email Updates:
Status: No Feedback Impact on me:
None 
Category:Connector for Node.js Severity:S1 (Critical)
Version:8.0.29 OS:Ubuntu (20.04.3)
Assigned to: CPU Architecture:Any (8 cores)
Tags: Crash (non fatal), hanging, max connection, performance issue

[8 Jun 2022 5:24] Jordan BRETON
Description:
Using the @mysql/xdevapi connector in nodejs caused my app to crash in prod in a random way. 

Server architecture : 

- MySQL is version 8.0.29-0ubuntu0.20.04.3
- NodeJS is 18.0.0
- Mysql XDevConnector for nodejs is v 8.0.29 installed with yarn
- Ubuntu 20.04.3 - 8 core - 16gb ram

Found out that the connector perform badly compared to the mysql2 package (from 3 to 4 time for sequential requests, for 100 to 450 times with parallel requests). 

Found out that if the created connection pool reaches max mysql server connections, the pool crash with another error message, and all subsequent client.getSession() hangs forever. 

This report have been posted first on StackOverflow : 

https://stackoverflow.com/questions/72462519/mysql-xdev-api-node-js-connector-client-getse...

How to repeat:
I conducted four types of tests : 

 1. Raw Testing* the `@mysql/xdevapi` module
 2. Raw testing* the `mysql2` module
 3. Testing the `@mysql/xdevapi` module with my app architecture
 4. Testing the `mysql2` module with my app architecture

[*] By raw testing, I mean testing the package without any private code involved. Only the module involved.

# Raw Testing

## Sequential

We try to execute 500 requests, one at a time, and see what happen :

### `mysql2`

```js 

const mysql = require('mysql2/promise');

const MAX_CON = Number.parseInt(process.argv[2]) || 120;
const NB_REQ = Number.parseInt(process.argv[3]) || 500;

(async() => {

	try{
		const client = new mysql.createPool(
			{
				host : '127.0.0.1',
				port : 3306,
				user : 'test',
				password : 'test',
				database : 'test',
				connectionLimit : MAX_CON
			}
		);

		const before = Date.now();
		for(let i = 0; i < NB_REQ; i++){
			const con = await client.getConnection();
			await con.query('SELECT 1 from user');
			await con.release();
		}

		const time = (Date.now() - before) / 1000; //in seconds
		const opBySecond = NB_REQ / time;
		console.log('The end : ', { opBySecond, time, NB_REQ, MAX_CON });

		await client.end();

		process.exit(0);
	}catch(e){
		console.log(e);
	}

})();

```

### `@mysql/xdevapi`

```js

const mysql = require('@mysql/xdevapi');

const MAX_CON = Number.parseInt(process.argv[2]) || 120;
const NB_REQ = Number.parseInt(process.argv[3]) || 500;

(async() => {

	try{
		const client = await mysql.getClient(
			{
				host : '127.0.0.1',
				port : 33060,
				user : 'test',
				password : 'test',
				schema : 'test'
			},
			{
				pooling : {
					maxSize : MAX_CON
				}
			}
		);

		const before = Date.now();
		for(let i = 0; i < NB_REQ; i++){
			const con = await client.getSession();
			const prep = await con.sql('SELECT 1 from user');
			await prep.execute();
			await con.close();
		}

		const time = (Date.now() - before) / 1000;
		const opBySecond = NB_REQ / time;
		console.log('The end : ', { opBySecond, time, NB_REQ, MAX_CON });

		await client.close();

		process.exit(0);
	}catch(e){
		console.log(e);
	}

})();

```

Boths scripts run `500` requests with a pool of `60` connections with respective commands : 

- `node squential-mysql2.js 60 500`
- `node squential-xdevapi.js 60 500`

And the results : 

| Driver | Op / sec | time (s) |
|--------|----------|----------|
|mysql2  |11 111.11 |     0.045|
|xdevapi |3 703.70  |     0.135|

`@mysql/xdevapi` is 3 to 4 times slower than `mysql2` with sequential execution. 

## Parallel

### `mysql2`

```js

const mysql = require('mysql2/promise');

const MAX_CON = Number.parseInt(process.argv[2]) || 60;
const NB_REQ = Number.parseInt(process.argv[3]) || 500;

(async() => {

	try{
		const client = new mysql.createPool(
			{
				host : '127.0.0.1',
				port : 3306,
				user : 'test',
				password : 'test',
				database : 'test',
				connectionLimit : MAX_CON
			}
		);

		const before = Date.now();
		const promises = [];
		for(let i = 0; i < NB_REQ; i++){
			promises.push(new Promise(async resolve => {
				const con = await client.getConnection();
				await con.query('SELECT 1 from user');
				await con.release();
				resolve();
			}))
		}
		await Promise.allSettled(promises);

		const time = (Date.now() - before) / 1000; //in seconds
		const opBySecond = NB_REQ / time;
		console.log('The end : ', { opBySecond, time, NB_REQ, MAX_CON });

		await client.end();

		process.exit(0);
	}catch(e){
		console.log(e);
	}

})();

```

### `@mysql/xdevapi`

```js

const mysql = require('@mysql/xdevapi');

const MAX_CON = Number.parseInt(process.argv[2]) || 60;
const NB_REQ = Number.parseInt(process.argv[3]) || 500;

(async() => {

	try{
		const client = await mysql.getClient(
			{
				host : '127.0.0.1',
				port : 33060,
				user : 'test',
				password : 'test',
				schema : 'test'
			},
			{
				pooling : {
					maxSize : MAX_CON
				}
			}
		);

		const before = Date.now();
		const promises = [];
		for(let i = 0; i < NB_REQ; i++){
			promises.push(new Promise(async resolve => {
				const con = await client.getSession();
				const prep = await con.sql('SELECT 1 from user');
				await prep.execute();
				await con.close();
				resolve()
			}));
		}

		await Promise.allSettled(promises);

		const time = (Date.now() - before) / 1000;
		const opBySecond = NB_REQ / time;
		console.log('The end : ', { opBySecond, time, NB_REQ, MAX_CON });

		await client.close();

		process.exit(0);
	}catch(e){
		console.log(e);
	}

})();

```

There things are worse.

Boths scripts run `500` requests with a pool of `60` connections with respective commands : 

- `node parallel-mysql2.js 60 500`
- `node parallel-xdevapi.js 60 500`

And the results : 

| Driver | Op / sec | time (s) |
|--------|----------|----------|
|mysql2  |8 771.92  |     0.057|
|xdevapi |     76.66|     6.522|

`@mysql/xdevapi` is 110 times slower than `mysql2` with parallel execution. But it doesn't stop there.

If you try `node parallel-xdevapi.js 120 500` you may get an error : 

```
/test/node/node_modules/@mysql/xdevapi/lib/DevAPI/Connection.js:920
            return state.error || new Error(errors.MESSAGES.ER_DEVAPI_CONNECTION_CLOSED);
                                  ^

Error: This session was closed. Use "mysqlx.getSession()" or "mysqlx.getClient()" to create a new one.
    at Object.getError (/test/node/node_modules/@mysql/xdevapi/lib/DevAPI/Connection.js:920:35)
    at Socket.<anonymous> (/test/node/node_modules/@mysql/xdevapi/lib/DevAPI/Connection.js:677:40)
    at Object.onceWrapper (node:events:642:26)
    at Socket.emit (node:events:527:28)
    at TCP.<anonymous> (node:net:715:12)

Node.js v18.0.0

```

While `node parallel-mysql2.js 120 500` output : 

| Driver | Op / sec | time (s) |
|--------|----------|----------|
|mysql2  |  7 142.86|      0.07|

**For the error thrown, my guess is that `mysql/xdevapi` doesn't handle the `mysql-server` max clients connected at a time. On my machine it's the default 128 value, since I have several process using the DB, running the script with a pool of `maxSize=120` crash the proces.**

It's the reason wy I didn't get any error in dev, but some hasardous crash in prod : with two pools of `maxSize=25` by process and 4 nodejs process launched in parallel, it reached a max of 200 mysql client connections. Any load of connections crashed one or several pools, and thus the process would not be able to recover from it. All subsquent calls to `client.getSession()` would hang indefinitly from this point.

# App Testing

I refactored the whole app DB connection handling to be sure it is consistant, then I created a DB component that can handle connector plugin/plugout. 

I created two plugin : one using a `@mysql/xedvapi` driver, and another one using `mysql2`. Both shown similar results than those above when I run some benchmarks with `autocannon`. 

I also created a connection wrapper that helps me know if any connection remains unclosed. No connection where left open anywere and were always returned to the pool after each query.

# Conclusion

In conclusion, I would say that it seems that something is wrong with the `@mysql/xdevapi` package at this time, in version `8.0.29` with mysql `8.0.29`, on nodejs `v18.0.0`, ubuntu `20.04`.

Either for it not handling the edge case when the mysql server cannot accept more connection, or for its slow perfromance, especially when queries are not sequential (which is the majority of the use cases in a webserver like nodejs application).

Suggested fix:

I solved my problem by switching from `@mysql/xdevapi` to `mysql2` for now.
[8 Jun 2022 7:01] MySQL Verification Team
Hello Jordan BRETON,

Thank you for the report and feedback.

regards,
Umesh
[8 Jun 2022 13:25] Rui Quelhas
Posted by developer:
 
Two different things here. Performance and how to handle the situation where the maximum number of connections in the server is exceeded.

Let's start with the performance concerns.

First of all, the X DevAPI connector uses TLS by default, whereas mysql2 does not. If you disable TLS on the X DevAPI connector (or vice-versa), you should see that, actually, the X DevAPI connector performs better in the sequential case.

With regards to the parallel case, the linchpin is the fact that you are constantly hitting the pool size limit, which causes new client connection requests to be queued. In the X DevAPI connector, we decided this warrants a rate limit, because probably you want to increase the pool size, instead of having a massive queue of requests to serve (and consequently, avoid unintended denial of service). In this case, what happens is that, once there is space in the pool, the queued connection requests are only retried after a given time as determined by a sliding window. We can argue the sliding window is probably too big or not, but this is by design. Again, you are probably better served by increasing the pool size.

In mysql2, there is no such thing, from what I can tell.

Other interesting things are happening. The connection pool in the X DevAPI connector not only manages the client side connections, but also the server side connections, by re-using the existing ones by only re-setting the underlying session (keeping users authenticated). On the other hand, mysql2 only manages the pool from the client standpoint, until you exceed the pool size limit, it will always create new connections. This is a lot worse in terms of resource management, and, of course means that in mysql2, you will hit the server side connection limit a lot faster.

If you check the connection ids in the server, you will notice that mysql2 creates a lot more server side connections than the X DevAPI connector. With you default test runs, more than 55 with mysql2, and less than 20 with the X DevAPI connector. If you consider this, and discard the rate limit for queued connections, you actually start to see a different picture. In your tests, you will also get a lot better results for the X DevAPI connector, still not as good as mysql2, but that's because we need to keep re-balancing the pool in order to do this kind of management and not simply look at the overall pool size.

In any case, these are not bugs or issues, but simply outcomes of particular design decisions. I'm not really sure what can be done here. Maybe we can expose the rate limit as a pool configuration option? But that's more like an enhancement request.

Moving on to the maximum number of server connections. The problem here, from what I can tell is when the value of the "maxSize" option in the pool is higher than the one defined for the "mysqlx_max_connections" server variable. In this case, the client is expected to return an error. This seems to happen, as you mention, but I think there is still something fishy going on and the client cannot recover from the fatal error sent by the server (which is also expected) but it seems to leave some unattended resources (sockets). So, I think we can consider this a bug, and I can address it in the scope of this report. However, I don't think it is a Sev2. :)

Thank your for your detailed bug report.
[8 Jun 2022 14:04] Rui Quelhas
Posted by developer:
 
In my last comment I've claimed the following:

"First of all, the X DevAPI connector uses TLS by default, whereas mysql2 does not. If you disable TLS on the X DevAPI connector (or vice-versa), you should see that, actually, the X DevAPI connector performs better in the sequential case."

I believe I was doing something wrong on my end. This does not seem to be the case. It is a lot better but i still does not perform better than mysql2.
[8 Jun 2022 15:47] Jordan BRETON
Thanks for your quick and detailed answer.

It definitely makes sense, I have now a better overall picture of what happen. Resources management was what made me opt for the XDev connector a while ago. It's why I redesigned my app to be able to easily switch back from mysql2 as soon as this bug is resolved, especially for a TLS scenario. I really love the design and features of this connector.

"Maybe we can expose the rate limit as a pool configuration option? But that's more like an enhancement request." => I think this is a good solution to help users fine-tune their configuration. Depending on the application we built, it seems mandatory to me to be able to support heigher load peaks, especially if the MySQL server is powerfull enough to handle it. 

"The problem here, from what I can tell is when the value of the "maxSize" option in the pool is higher than the one defined for the "mysqlx_max_connections" server variable.". Not exactly. You can crash a pool with maxSize of 20 connections, if you run enough instance of the process. 

As soon as one of these process try to open a new connection on the MySQL server, and that that server runs out of available connections, the pool of that process will crash. The thing is that if you distribute the load in a round robin fashion accross your processes, you can get to a point where all pools crashes in all processes, especially if the high load peak last for few minutes, since all the connections the crashed pool can't handle anymore will have to be handled by a few subset of the running processes. So even increasing the maxSize of each pool can't solve the problem for now.

"However, I don't think it is a Sev2. :)" Feels free downgrade this ticket severiy to the level you think suits better to the bug. 

I choosed this one reading all definitions. I found out, at the moment, that "Represents a complete loss of service, a significant functionality is missing, a system that hangs indefinitely; and there is no available workaround." suited well, especially for the "system that hangs indefinitely" part, but reading it a second time it seems to speak about an OS level hanging... And since I had to change the driver for my use case, I thought that it was consistent with the "no available workaround" part ^^

Again, thanks for your time, attention and answers.
[28 Jun 2022 17:22] Rui Quelhas
Posted by developer:
 
Just got a chance to get back at this.

"Not exactly. You can crash a pool with maxSize of 20 connections, if you run enough instance of the process."

My statement was not very well formulated. If you run enough instances of the process you eventually exceed the maximum number of connections allowed by the server. So, in practice, the "maxSize" value defined for the pool of one of those processes will be higher than the remaining slots allowed by the server.

In this case, the server will disconnect any new connection. Which brings us to the next point.
 
"As soon as one of these process try to open a new connection on the MySQL server, and that that server runs out of available connections, the pool of that process will crash."

The pool does not crash, the attempt to create a connection fails, but the pool is still working fine. If an active connection is released, you can create connections again. In your example, the corresponding promises executed by "Promise.allSettled()" will be rejected.

"The thing is that if you distribute the load in a round robin fashion accross your processes, you can get to a point where all pools crashes in all processes, especially if the high load peak last for few
minutes, since all the connections the crashed pool can't handle anymore will have to be handled by a few subset of the running processes. So even increasing the maxSize of each pool can't solve the problem for now."

Again, the pool does not crash. Increasing the "maxSize" in that case is pointless unless you also increase the value of "mysqlx_max_connections" accordingly.

In the end, apart from the error message, which does not provide to much context (ECONNRESET basically), I don't see exactly what can be improved here.

It is mostly a problem of mismatched expectations. You were expecting the connection requests to somewhat be queued in that scenario, at least from what I can tell. That is also not a good option, because, in the end, you could end up retrying connections indefinitely, and all of those retries would always have to reach the server (not good). You are probably better off monitoring rejected connections and adjust both the client and server configuration accordingly.

With regards to performance, comparing an X Protocol implementation like Connector/Node.js with a classic protocol implementation like node-mysql2 is comparing apples to oranges, particularly for this specific type of benchmarks focused mostly on connection time. I mentioned TLS before, but there are other things such as connection attributes and enhanced authentication negotiation which contribute to worse performance results but provide other kind of benefits.

Some kind of refresh rate for the connection pool, which can be configured by the application to determine the waiting time between retries of queued connection requests is not a bad idea for an enhancement. I would suggest you open an enhancement request for that, as a first step in evaluating such a feature.

In the meantime, unless you have something else you wish to add to the discussion. I will close this specific report.

Thank you very much for your feedback and thank you for using MySQL.
[5 Aug 2022 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".