Bug #92562 Statement cancellation timer can cause memory leak in web applications
Submitted: 25 Sep 2018 13:30 Modified: 30 Jul 2020 19:13
Reporter: Cedric Counotte Email Updates:
Status: Not a Bug Impact on me:
None 
Category:Connector / J Severity:S1 (Critical)
Version:5.1.47 OS:Any
Assigned to: Filipe Silva CPU Architecture:Any
Tags: memory leak setQueryTimeOut

[25 Sep 2018 13:30] Cedric Counotte
Description:
Just wanted to confirm this bug (#36565) is still present in latest version of the connector, that is 5.1.47.

A timer thread is started and never terminated when running queries with setQueryTimeOut().

That's a real problem when used in a service running 24x7, and will end-up consuming all available threads and ressources.

How to repeat:
It's very easy to reproduce:

Build a web app that only does that (during init):
- Open a connection
- Prepare a statement
- setQueryTimeout(30)
- Execute query
- Close the connection.

Then load the web app into Tomcat8 and stop it.

Tomcat8 logs will show this:

org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [xxxx] appears to have started a thread named [MySQL Statement Cancellation Timer] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 java.lang.Object.wait(Native Method)
 java.util.TimerThread.mainLoop(Timer.java:552)
 java.util.TimerThread.run(Timer.java:505)

Then the following page of tomcat8 shows there's a leak for that webapp:
http://localhost/manager/text/findleaks?statusLine=true

OK - Found potential memory leaks in the following applications:
/webapp

This leak will remain permanently.

Suggested fix:
Terminate the timer when close() is called or the query is completed.
[25 Sep 2018 13:48] Cedric Counotte
By the way, if you make this web app, loop into opening/querying/closing, you'll eventually see more than just one timer being left behind.

So it's not just when a web app is stopped or restarted, it's leaking new timer constantly!

In production I can see dozens of timer left behind when I update a web app.

We currently have 250 remote systems running Mysql+Tomcat8 and the same issues shows everywhere, eventually requiring the tomcat8 service to be restarted.

Monitoring its memory consumption does not help as it shows < 30% consumed (of 512MB) while logs shows something like OOM Error : 'couldn't create new thread'.

Currently testing on a few machine without setQueryTimeout(), tomcat8 reports no leak anymore! But I'm not confident that queries my get stuck and producing the same results at the end, or worse actually.
[4 Oct 2018 22:42] Filipe Silva
Thank you for this report Cedric.

I need some more details in order to be able to help you:

1. In the test case you described, is the query still running when you stop the web app? If so, you get a lot more errors and stack traces than just the one you mentioned, right?

2. Where do you placed the connector-java-5.1.47.jar? Was it inside the WEB-INF/lib or in a global libraries directory, say, $CATALINA_HOME/lib?

3. Are you setting, or tried setting the option clearReferencesStopThreads="true" in the context configuration in context.xml?

4. In your followup you said that just by looping opening/querying/closing you end up with multiple alive time threads. Is that so? Are those threads really alive even without stopping the web app?

5. I'm afraid I can't reproduce the exact behavior you are describing. I see that when I stop the web app while the query is running, there remains in fact a thread running, but once I restart the app, the thread ends eventually. So, can you please post some test cases and describe your setup the best you can regarding tomcat configurations, jar placement, connection properties used, etc, so that I can reproduce it exactly as you described?

Looking forward for your reply. Thanks,
[5 Oct 2018 5:35] Cedric Counotte
Sample project to reproduce issue

Attachment: test.zip (application/x-zip-compressed, text), 4.56 KiB.

[5 Oct 2018 5:51] Cedric Counotte
1. No, the query is terminated long ago. See attached sample project.

2. In $CATALINA_HOME/lib.

3. Option clearReferencesStopThreads="true" doesn't appear to help.

4. I have a dozen web apps using the same query mechanism and they all eventually leaks many timer threads.

5. See attachement. 

I use tomcat 8.0.41 if it matters, installed in a folder under D:\ drive with jdk/jre folders next to it (installed and referenced manually).
[5 Dec 2018 0:57] Filipe Silva
Hi Cedric,

Sorry for the late reply.

I'm still unable to reproduce this. I used your test case with a few adjustments in Tomcat 9 (I don't think the version matters) and still I don't see any statement cancellation threads being left active. Actually, in order to see some remnants of these threads in VisualVM I had to add a SLEEP function in the query so that it runs for enough time before destroying the threads. So, all of them show up as terminated.

Previously you said that looping into opening/querying/closing would cause even more active threads being left behind. I also tried this and still I don't see anything wrong.

As I said before, the closest behavior I can observe is when I stop the web application while the query is running. But still, the thread left behind terminates when the app is restarted.

I really need more details to try to reproduce this.

BTW, I wonder if you are able to observer the same behavior without using Tomcat.

Thanks,
[5 Jan 2019 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[5 Jan 2019 16:49] Cedric Counotte
So I suppose this bugs occurs only on Tomcat8, likely 8.0.x only.

Please try to reproduce this bug under the same circumstances, it's as easy as publish the test case .war, and unpublish it.

No need to interrupt any queries, it just leaks threads (at least) on every unpublish.
[22 Jan 2019 16:44] Cedric Counotte
I've just upgraded one of my system to Tomcat 9.0.

The good news is the error log is no longer shown upon un-publishing the test war.

The bad news is that tomcat is still reporting the war as having possibly leaked resources (when un-published) and the leak is still reported after a day. Each publish/un-publish of the test war creates a new line in the leak report.
[18 Jun 2020 7:59] Jayrome Vergara
We are still having this issue from time to time. We are already on the latest 5.1.48 connector. I exported the heapdump few hours after tomcat is reporting the leak and I can confirm that it is the same cancelTimer of com.mysql.jdbc.JDBC4Connection that is causing the memory leak.
[30 Jul 2020 17:44] Filipe Silva
Revisiting this once more I came to the conclusion that there must be some misconception regarding how Connector/J implements query timeouts and the statement cancellation timer.

I haven't been able to fully reproduce this and the closest I got was when I ran the given test case using the data source resource as originally suggested. In my previous tests I had always created connections directly via DriverManager inside the test servlet, which made a difference.

By using a Data Source resource, by default, Tomcat provides connections through a connection pool, making them long-lived connections and, thus, a  java.util.Timer instance is kept alive throughout the entire lifetime of the connection. As explained in the description of the connection property `enableQueryTimeouts` (https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-propertie...).

Tomcat and other application server have evolved a lot with regard to handling very efficiently possible leaks caused by stopped applications and in most cases the most you get now is a warning in the error log.

I hope this situation has improved and clarified by now.

By the way, I recommend upgrading to latest Connector/J 8.0 if you haven't done so yet.

I apologize for taking so long to bring some closure to this bug report. If you think the issue is still there, please re-open this bug report or file a new one and provide the best test case and instructions to replicate it so that we can help you.

Thank you.
[30 Jul 2020 19:13] Cedric Counotte
This bug is persistent and easy to reproduce. The connection is gone, but the timer is permanently leaked. Loading/unloading the war creates a leak and it remains there permanently after days of running server, even though all connections are properly closed. Until server is restarted.

No misconception, thank you for your assistance.
[30 Jul 2020 21:15] Filipe Silva
Live threads

Attachment: Screenshot from 2020-07-30 22-13-23.png (image/png, text), 207.22 KiB.

[30 Jul 2020 21:22] Filipe Silva
Hi Cedric,

I understand your concern but if I can't reproduce it I can't do anything about it.

Check the screenshot I just sent. This is from the following sequence of events:

1. Tomcat starts
   - 1st MySQL Statement Cancellation Timer is running.
2. Right before 22:13:00 I reloaded the web app:
   - 1st MySQL Statement Cancellation Timer stops.
   - 2nd MySQL Statement Cancellation Timer spawns when the webapp starts again.
3. 7 Secs later I stopped the web app:
   - 2nd MySQL Statement Cancellation Timer stops.

This by running a piece of code identical to yours.

BTW, I used Tomcat 8.5.57. Please don't ask me to test Tomcat 8.0.x. It makes no sense to try to "fix" something for that version.