Bug #114453 MySQL Connector/J never participates in the TCP connection terminations
Submitted: 22 Mar 13:53 Modified: 8 May 1:38
Reporter: Jean-christophe Manciot Email Updates:
Status: Need Feedback Impact on me:
Category:Connector / J Severity:S2 (Serious)
Version:8.3.0 OS:Ubuntu (23.04 lunar)
Assigned to: MySQL Verification Team CPU Architecture:x86
Tags: MYSQl 8.0.35

[22 Mar 13:53] Jean-christophe Manciot
  - Keycloak server 24.0.1 using MySQL Connector/J 8.3.0 (Revision: 805f872a57875f311cb82487efcfb070411a3fa0)
  - MySQL server 8.0.35-0ubuntu0.23.04.1
  - linux 6.3.0-7-generic on both hosts
  - All connections with MySQL server are encrypted with TLS 1.3:
  - Both wait_timeout and mysqlx_wait_timeout are set to 8 hours on the MySQL server

With a standalone keycloak server with MySQL Connector/J using a MySQL server located on the same local Linux bridge without any network traffic interference, no TCP connection termination initiated by MySQL server is terminated by MySQL Connector/J when the latter has no more data to send despite acknowledging the TCP 'FIN'.

After a period of inactivity, when MySQL Connector/J needs to send new data, it does so without opening a new TCP connection. This triggers TCP resets from MySQL server and failed Keycloak GUI requests with the error message 'Network response was not OK'.

How to repeat:
After purging keycloak server and the keycloak database from the MySQL server:

1. Creating keycloak database and user on the external mysql server
2. Downloading keycloak server 24.0.1
3. Setting up:
- [conf/keycloak.conf](https://github.com/keycloak/keycloak/files/14668930/keycloak.conf.txt)
- [conf/quarkus.properties](https://github.com/keycloak/keycloak/files/14668943/quarkus.properties.txt)
4. Building with:
bin/kc.sh build
5. Starting with:

bin/kc.sh start --proxy-headers=xforwarded &
6. Log into a kerycloak server (as admin)
7. Click on 'Realm Roles' for instance on the left
8. Wait for 5 seconds
4. Click on 'Clients' for instance on the left
5. Notice the following errors:
[keycloak PeriodicRecovery consequences in GUI 1](https://github.com/keycloak/keycloak/assets/13176858/84a0daaf-462d-4907-aac5-4133c2bea03b)

In the MySQL server log, the keycloak error matches this error:
{ "prio" : 3, "err_code" : 10914, "subsystem" : "Server", "source_file" : "sql_connect.cc", "function" : "end_connection", "msg" : "Aborted connection 161 to db: 'keycloak' user: 'keycloak' host: 'keycloak.example.com' (Got an error reading communication packets).", "time" : "2024-03-20T14:20:41.848940Z", "ts" : 1710943624589, "thread" : 161, "err_symbol" : "ER_ABORTING_USER_CONNECTION", "SQL_state" : "HY000", "label" : "Note" }

In the wireshark trace, the typical sequence is shown below where:
- is the MySQL server
- is the keycloak server
[Wireshark typical trace](https://github.com/keycloak/keycloak/assets/13176858/77fb83eb-cb65-40e8-84af-2a1510881741)

Suggested fix:
Make MySQL Connector/J participate in the TCP connection termination process initiated by the MySQL server as soon as the former has no more data to send.
Then MySQL Connector/J should try to open a new TCP connection afterwards, avoiding all the RESETs sent by the MySQL server and improving significantly the performance of the data exchanges.
[22 Mar 14:00] Jean-christophe Manciot
The last steps '4. Click on...' and '5. Notice the...' must be replaced by '9. Click on...' and '10. Notice the...' respectively.
[22 Mar 14:02] MySQL Verification Team
Hi Mr. Manciot,

Thank you for your bug report.

Please, check all your other system variables, like :


We are waiting on your feedback.
[28 Mar 14:56] Jean-christophe Manciot
Sorry for the delay, but despite being subscribed to email updates, I received none after your comment.

Anyway, here is the answer:
| Variable_name                     | Value    |
| connect_timeout                   | 600      |
| interactive_timeout               | 28800    |
| net_read_timeout                  | 600      |
| net_write_timeout                 | 600      |
| ssl_session_cache_timeout         | 300      |

And there is no such thing defined as a system variable named 'port_open_timeout'. 'port-open-timeout' has not been used either to start the server AFAIK.
[28 Mar 16:10] MySQL Verification Team
Hi Jean Christophe,

Have you tried to disable automatic reconnection ?????
[29 Mar 11:54] Jean-christophe Manciot
I'm not sure what you mean by "disabling automatic reconnection".

From 3.6.8 Automatic Reconnection Control of MySQL 8.0 C API Developer Guide, Auto-reconnect is disabled by default and I did not actively tried to enable it. Anyway, I don't know how to make sure it is actually disabled.

I tried another lead taken from 'Troubleshooting Connector/J Applications' (https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-troubleshooting.html#qanda...):
"Ensure connections are valid when used from the connection pool. Use a query that starts with /* ping */ to execute a lightweight ping instead of full query. Note, the syntax of the ping needs to be exactly as specified here."
So I set quarkus.datasource.jdbc.validation-query-sql with /* ping */ instead of SELECT 1
  - all 'com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure' are still there
  - all MySQL server FIN/ACK and RESETs also
[3 Apr 8:55] Jean-christophe Manciot
Hi MySQL Verification Team,

Have you followed the steps outlined in the first post to reproduce this issue?
[8 May 1:38] Filipe Silva
Hi Jean-christophe,

Do you have samples of the MySQL error log and Java stack traces for the occurrences?

If I understand correctly your suggested fix, this is not possible in MySQL Connector/J. The driver does not reconnect automatically broken connections, not even when setting `autoReconnect=true`, which always returns an Exception between re-connects. This is so by design so that application know that something wrong is happening. And mind that `autoReconnect=true` is not recommended.

It is the responsibility of connection pool managers to keep connections alive and re-create new ones when needed. Typically a connection handed over by a connection pool manager is verified to work fine so if something happens afterwards it must be because of network issues or because a server crash or connection shutdown due to some special condition. Applications are required to return connections to the pool as soon as possible to avoid other issues.

From your description I can't understand if this error occurs all the time or a few times after having the system operating normally. It seems to me that you have other connectivity or setup issues and there's nothing we can do in the server or connector to avoid them.

Since you are already checking network traffic, I'd suggest disabling connection encryption and see what kind of packets are being exchanged here. It could help identifying the exact point the the data flow the issues are arising.

Please check your configurations carefully and provide any additional information that could help us help you.