MySQL Bugs: #62849: Memory leak using SELECT statements in a multi-threaded application.

Bug #62849	Memory leak using SELECT statements in a multi-threaded application.
Submitted:	20 Oct 2011 16:59	Modified:	24 Jan 2013 18:20
Reporter:	Edwin Vandamme	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: C API (client library)	Severity:	S2 (Serious)
Version:	5.0.45, 5.0.77, 5.5.16	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	memory leak, multi threaded

Description:
Memory leak using SELECT statements in a multi-threaded application.
This might be related to bug id 41810.

Running a multi-threaded program (5 threads for testing) a memory leak occurs, during a SELECT query, after a few minutes up to several days.
Testing on CentOS 5.3 with MySQL 5.0.77 and 5.5.16 C-API installed and connecting to a remote MySQL server (server not available to me)
The bug seems to be affected by the processor speed as on a slower system it seems to occur less often.
The bug seemed to occur more often on MySQL 5.0.77 C-API.
As far as I can see, I followed all guidelines regarding multi-threaded applications.
The code was derived from a production code (hence some elabored code to stick to the original as close as possible) which seems to produce the problem more often.

I could not use valgrind to create a report as this slows everything down to such an extend that the leak did not occur after more then 4 days.

MySQL 5.5.16 installation procedure after removing old version with yum :
   rpm -ivh MySQL-devel-5.5.16-1.rhel5.i386.rpm
   rpm -ivh MySQL-shared-5.5.16-1.rhel5.i386.rpm
   rpm -ivh MySQL-client-5.5.16-1.rhel5.i386.rpm

Limitations on the supplied code :
- Everything is hard coded for the folder /Programs/functions/test.mysql/
- Be aware that the log files grow pretty fast, so enough disks space should be available.

Included following files (mysql.error.report.tar.gz) :
Makefile.proc_log1     Makefile to compile proc_log1.cc
Makefile.proc_log2     Makefile to compile proc_log2.cc
Makefile.rund          Makefile to compile rund.cc

_memory1               Bash script, using proc_log1 to find memory errors in the log files created by the bash file rund_mem that is run as a cronjob.
_memory2               Bash script, using proc_log2 to find memory errors in the log files created by the multi-threaded test program rund.
proc_log1.cc           Program to process a log file created by the bash file rund_mem.
proc_log2.cc           Program to process a log file created by the multi-threaded test program rund.
rund.cc                Multi-threaded test program.
                          Variables can be set in lines 133-148
                          Please set :
                             g.mysql_h  MySQL Host IP address.
                             g.mysql_u  MySQL user name.
                             g.mysql_pw MySQL password.
rund.cronjobs          File to add rund_mem to the cronjobs.
rund_mem               Bash script that should be run as a cronjob in order to monitor memory usage outside of rund.
rund_start             Bash file to quickly start rund together with the full path name (to make sure rund_mem works correctly).

Table used :
DROP TABLE IF EXISTS `callsystem`.`scripts`;
CREATE TABLE  `callsystem`.`scripts` (
  `IServer` int(11) NOT NULL default '0',
  `Unit` int(11) NOT NULL default '0',
  `Script` varchar(10) NOT NULL default '',
  `Node` varchar(4) NOT NULL default '',
  `Type` char(1) NOT NULL default '',
  `OnKey1` varchar(4) NOT NULL default '',
  `OnKey2` varchar(4) NOT NULL default '',
  `OnKey3` varchar(4) NOT NULL default '',
  `OnKey4` varchar(4) NOT NULL default '',
  `OnKey5` varchar(4) NOT NULL default '',
  `OnKey6` varchar(4) NOT NULL default '',
  `OnKey7` varchar(4) NOT NULL default '',
  `OnKey8` varchar(4) NOT NULL default '',
  `OnKey9` varchar(4) NOT NULL default '',
  `OnKey0` varchar(4) NOT NULL default '',
  `OnKeyP` varchar(4) NOT NULL default '',
  `OnKeyA` varchar(4) NOT NULL default '',
  `ClearKeys` int(1) NOT NULL default '0',
  `WaitMess` int(1) NOT NULL default '0',
  `Digits` int(2) NOT NULL default '0',
  `TimeOut` int(3) NOT NULL default '0',
  `OnTimeOut` varchar(4) NOT NULL default '',
  `Tel` varchar(20) NOT NULL default '',
  `Message` varchar(40) NOT NULL default '',
  `Function` varchar(50) NOT NULL default '',
  PRIMARY KEY  (`IServer`,`Unit`,`Script`,`Node`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

How to repeat:
Copy the supplied code to /Programs/functions/test.mysql/ (or modify the hard code elements with your installation folder)
Compile using the suplied Makefiles.
Start using the bash script rund_start.

Code and tools to reproduce the problem

Attachment: mysql.error.report.tar.gz (application/x-gzip, text), 10.43 KiB.

Some logs from a recent test

Attachment: mysql.error.report.logs.tar.gz (application/x-gzip, text), 3.84 KiB.

From further testing with additional logging on the line prior to mysql_query and the line after mysql_query.
The problem seems to be located in mysql_query.

Hi Edwin,

I'm starting to examine this bug.  First thing I noticed is in your Makefile:

/usr/bin/mysql_config --include --libs

Please, use the multithreaded client libs!

/usr/bin/mysql_config --include --libs_r

After minor tweaks, I built the example code, but running it doesn't seem to do much. Appears to be dependent on your environment, paths, database schema etc.

If memory leaks after recompiling with threadsafe client libs, I suggest you try 
using google heap profiler to find out where the allocations are.

Something like this:

HEAP_PROFILE_ALLOCATION_INTERVAL=100000 HEAP_PROFILE_INUSE_INTERVAL=100000 
HEAPPROFILE=mybin.hprof LD_PRELOAD=/usr/local/lib/libtcmalloc.so ./application 

pprof --pdf ./application ./mybin.hprof

Full info here:
http://google-perftools.googlecode.com/svn/trunk/doc/heapprofile.html

Dear Shane,

Thank you for having a look at my problem.
1) I was already aware of having to use --libs_r, so I have run both (I seem to have send the incorrect makefile).
To make 100% sure I will rerun the code with --libs_r
My production code uses --libs_r for sure and is showing the same problem.
2) Yes, as stated in the original report, everything should be in /Programs/functions/test.mysql/ unless you search/replace that string in all code.
3) In the mean time I have modified writeLog to use localtime_r instead of localtime, althoug as this is in a mutex it should make no difference.
I have also checked all code for other non thread-safe commands (as far as I know of) and I am using no others.
4)I will have a look at the profiler tools and keep you informed.

After installing rpm -ivh google-perftools-1.8-1.i386.rpm
I get the following error when launching the rund test program "ERROR: ld.so: object '/usr/lib/libtcmalloc.so' from LD_PRELOAD cannot be preloaded: ignored."

I had to use /usr/lib/libtcmalloc.so.0 instead of /usr/lib/libtcmalloc.so
Profiling is running at the moment, awaiting results.

Using the profiler everything runs very slow, simular to running valgrind.
As a result the problem only occured once during a 3 day run.

[2011/11/01 13:07:14] rund :        22765568 : ==>   : 002 : doQ ()
[2011/11/01 13:07:14] rund :        22765568 : ===   : 002 : doQ () - 001
[2011/11/01 13:07:14] rund :        22831104 : ===   : 002 : doQ () - 004
[2011/11/01 13:07:14] rund :        22831104 : <==   : 002 : doQ ()

I modified the functions doQ and writeLog by moving the pthread_mutex_lock to the first line of the function so that all logs are nicely grouped per thread.
The test ran using --libs_r

There was only 1 heap file created upon deamonizing by the profiler.

I did another run without the profiler using --libs_r and a modified rund.cc and the problem still occurred.

Included following files (mysql.error.report.002.tar.gz) :
Makefile.proc_log1     Makefile to compile proc_log1.cc
_memory1.log           Processed rund.mem.log log.
_memory2.log           Processed rund.log.* logs.
rund.log.001           rund log file.
rund.log.002           rund log file.
rund.mem.log           rund cronjob log file.
rund.log.x.001         Error situation copied from the log files for eas of usage.

Makefile.rund          Makefile to compile rund.cc using --libs_r.
rund                   Executable rund from compiling rund.cc.
rund.cc                Modified rund.cc.
rund_prof              Bash file to start rund with Google profiler.

Logs and code as mentioned above (minus log files as they are to big)

Attachment: mysql.error.report.002.tar.gz (application/x-gzip, text), 27.63 KiB.

I have found the other heap files (0001-0266).
Although I only got 1 memory leak at the start of the test (in my own logging during the test), I find the following rather strange :
I am not familiar with the Google profiler so I am not sure on how to interprete the results, but a negative value stikes me as odd.

pprof --text --base=rund_.0001.heap rund rund_.0265.heap
Using local file rund.
Using local file rund_.0265.heap.
Total: 0.0 MB
     0.0 100.0% 100.0%      0.0 100.0% my_malloc
     0.0   0.0% 100.0%      0.0 100.0% __clone
     0.0   0.0% 100.0%     -0.0 -4372.3% cli_read_query_result
     0.0   0.0% 100.0%      0.0 100.0% doQ
     0.0   0.0% 100.0%      0.0 100.0% handleISDN
     0.0   0.0% 100.0%     -0.0 -4372.3% mysql_query
     0.0   0.0% 100.0%     -0.0 -4372.3% mysql_real_query
     0.0   0.0% 100.0%      0.0 4472.3% mysql_store_result
     0.0   0.0% 100.0%      0.0 100.0% start_thread

I now did 2 runs with valgrind in order to try and prove my point.
The first as follows (mysql.error.report.003.tar.gz) :
   valgrind --leak-check=full --trace-children=yes --log-file=vg_rund.txt /Programs/functions/test.mysql/rund
The second as follows (mysql.error.report.004.tar.gz) :
   valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=vg_rund.txt /Programs/functions/test.mysql/rund

There are instances where memory is lost after days of running without anny problems, then suddenly 31 leaks within a timeframe of an hour. (2th run after 7 days)
As the program is supposed to run 24/7 you can imagine that this causes problems in the long run.

The weird situation is that when using valgrind the memory leaks are always 16384 bytes, while without valgrind the values are much higher and not constant.
Unfortunately, valgrind does not seem to report where the memory is eventually leaked.

1st run with valgrind as mentioned above.

Attachment: mysql.error.report.003.tar.gz (application/x-gzip, text), 194.31 KiB.

2th run with valgrind as mentioned above.

Attachment: mysql.error.report.004.tar.gz (application/x-gzip, text), 290.09 KiB.

Ran rund for a while with helgrind without anny problems.
Possibly due to even slower running then with valgrind, the memory issue did not occur.

Anny updates ?

I have been studied a bit your comments and it seems that phenomena that you observe depend a lot on the speed. Hence, most of the tools available will not help, as they will slow down things a lot.

I would, therefore, suggest that you use SAFEMALLOC, that is optionally linked into  C API (and also in the server). But, SAFEMALLOC does not exist any more in version 5.5. Hence, you should use 5.0 or 5.1.

In order to use it, you should first configure entire source tree with --with-debug-full configure option. You should then link your application with new libmysqlclient library, preferably a static one.

That should be enough. You can run your test program then and it should not be slower then 15 to 20 % then the original. There should not be any trace file (watch /tmp/ subdir for any), but if there is a trace file, we can  then disallow it with -debug or -# option.

If there are any error messages, they will show up in stderr, so redirect the output from stderr to some file.

Let us know if this worked for you and what are the results.

Thanks in advance.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

In the hope that someone will take this seriously I would like to report some results after a couple of months testing, trying to come up with some tangible proof (a valgrind report vg_rund.txt) as to the memory leak that prompted the initial bug report :

1) Running the program with mysql_query commented did not show any memory leaks, even after 5+ days running.
2) Running the program with mysql_query active results in a memory leak within 48 hours.
3) Running the program with mysql_real_query instead of mysql_query results in a memory leak after 5 days or more.

From this I would conclude that :
a) There is a memory leak in either 1 of the following mysql commands : mysql_query, mysql_store_result, mysql_fetch_row.
b) Most likely the problem is related to mysql_query as the leak does not occur as fast with mysql_real_query.

Final results where obtained on mysql 5.5.27 on a CentOS 5.7 minimal install with no network connection on a somewhat faster computer as originally used and with only 3 threads active instead of 5.

The following files are included in mysql.error.report.005.tar.gz
/test.mysql/mysql.install/mysql.5.5.27.install Installation procedure used to install mysql.
/test.mysql/mysql.install/my.conf mysql configuration file used.
/test.mysql/mysql.install/_structure.sql File to initialise mysql data.
/test.mysql/mysql.install/mysql.ini.sql File to initialise mysql data.
/test.mysql/mysql.install/ndb_check.sql File to initialise mysql data.
/test.mysql/_memory2.log Extracted memory changes from the log files.
/test.mysql/rund.cc The source code of the program used to test (makefile as previously posted).
/test.mysql/vg_rund Bash file to start the program with valgrind
/test.mysql/vg_rund.txt The results of the last run. I am no valgrind expert so I am not 100% sure if the results are meaningful, at least I see mysql_query show up in the log quite a number of times.

For some reason I cannot upload the file, but I will send it using the ftp option.

Last test results with valgrind errors reported.

Attachment: mysql.error.report.005.tar.gz (application/x-gzip, text), 26.62 KiB.

Hi Edwin,

So I'm running the rund program now under gperftools and there is no leak (running at 500 queries per second). 

I checked the database, and only a single connection is running queries.
It seems all the program does is select from the table `callsystem`.`scripts`?
Given that, I inserted 1 row into the table so the query has a resultset.

"INSERT INTO `callsystem`.`scripts` set IServer=0,Unit=0,Script='HIJGEN',node='2002';"

If all the program does is a simple select, I'd suggest just trying mysqlslap and see if it leaks instead.  

Next, the valgrind output proves only one thing, you have a bug in your shutdown handler code where it reads and writes to freed memory.  No line numbers shown, since you probably didn't compile with -g.

whoops, does this program leak memory ?

Attachment: bug62849.c (text/plain), 5.85 KiB.

Thanks for running the program, but to see any result you should run it somewhere between 24 and 48 hours.
Obviously this is a cut down version from the production program to illustrate the problem (I actually kept cutting it down until I got to this part).
When you look at the log file it creates it logs the amount of memory used as well, which is my base to detect memory loss without using valgrind as valgrind really slows down everything and it also serialises the threads.

All threads use the same connection which is protected by a mutex to avoid simultaneous usage, this to avoid tons of connections from the production application (which runs minimal 120 threads). Typically in production every thread would only perform a few actions every minute so a single mysql connection does not slow down the application.

Now from 1 of my runs I included _memory2.log which is an extract from all log files where memory is being lost during a run (Otherwise you get tons of useless data which is hard to dig trough, but you can always look at the start of a log file to check if memory usage has changed between log files). You will notice that after about 7 hours there is a sudden increase of lost memory, which shouldn’t happen as the program does the same thing over and over again.
As I stated before, it might take up to 48 hours before the problem arises and when I remove mysql_query from the program there is no memory loss.
The program proc_log2.cc to process the logs in search of memory issues was included in an earlier post.

Now in the production application this results in a program crash about once a month.
When I also run the test application the system crashes within 24 hours.

I have been looking into this problem for the past 2 years, before I arrived at a mysql problem, so I am :
a) either really stupid (a possibility).
b) keep looking over something obvious (you probably know how that goes, i will bang my head against the wall if you point me to the problem)
c) absolutely right.

Now I normally don’t report problems, this is actually the first time in 25 years, but I am at a complete loss and the fact that this is the only problem I have never been able to fix bothers the hell out of me.

I will have a look at the bug62849.c tomorrow morning (it’s 22:00 here a.t.m.)

I started bug62849.c and it has been running for 2 hours now with stable memory usage as reported by it's /proc/PID/stat file.
I will keep it going for at least 48 hours and see if I have memory problems.
In the mean time I will look at your code and see if there is something I missed in my own code.

To avoid temptation to modify bug62849.c, I did not look at the code right away.
As it stopped suddenly, I noticed you had a timer set to just under 17 hours.
I have modified this to 72 hours and restarted the test.

Shane,

After running your test program for 72 hours (369.332.476 queries), some memory leaks where apparent from the /proc/PID/stat file.
The values that changed in the stat file are :
      unsigned long     utime;         // user-mode CPU time accumulated by process
      unsigned long     stime;         // kernel-mode CPU time accumulated by process
      unsigned long     vsize;         // number of pages of virtual memory ...
      long              rss;           // resident set size from /proc/#/stat
Included you will find stat.tot which has 3 lines each from a different time during the program run.
I mistakenly looked at the wrong column during the test so I only saw the problem at the very end.
Line 1 is just after the start of the program 20120926 11:55
Line 2 is about a day later 20120927 08:24  memory is same as line 1
Line 3 is 1 minute before the end of the test 20120929 11:55 memory is increased by 1048576.

So can you explain to me why there is no increase between Line 1 and 2 but an increase between Line 2 and 3 ?

Anyhow, I assume you have been running my test program for 72 hours and have analysed the log files with proc_log2, so I am curious as to your results.

bug62849 results after 72 hours

Attachment: stat.tot (application/octet-stream, text), 624 bytes.

I wouldn't call it a leak yet.  But I also am not familiar enough with Linux memory management to explain the changes fully:

number of minor page faults                          467,473,483
Size of the process' virtual memory space (in bytes) 58261504,58261504,59310080
Size of the process' resident set (in bytes)         394,400,410

It could just be net_buffer_length expanding to accommodate for a larger result.

You said:  "72 hours (369.332.476 queries)"

Since I'm doing 2000 queries per second with a static 1 row dataset, I must assume you have larger data.  Does the data change?  Does the resultset change?  

I got a feel you are running libmysqld embedded server.  Is that what you are using in your app, or is it always libmysql client?

I will upload a google heap profiler output after my app hits 400 million queries, so we can inspect what exactly allocates memory.

Shane,

At 2000 queries/sec you should get about 518.400.000 queries in 72 hours.
My test hardware is a 5 year old desktop that I have been using for this test, so that might be the reason I get fewer queries.
I use the data set as generated by my installation procedure. (Just to make sure we use the same data)
I modified your program to extract the memory usage from the stat file, that is how programs like “top” do it. It’s attached as bug62849_1.c, it is a little crude but you might want to adjust as to your needs.

Remember that when I did the tests with my test program without mysql_query I did not see any memory changes over a 10 day period.

“It could just be net_buffer_length expanding to accommodate for a larger result” No as it is always the same query with the same result.

I always use libmysql client as per my test program.

Modified bug62849.c to read memory usage from stat file.

Attachment: bug62849_1.c (application/octet-stream, text), 10.85 KiB.

5.5.27 client libs, 480 million queries, no leaking seen.

Attachment: no_leaks.txt (text/plain), 9.16 KiB.

Shane,

Thank you for running the test on your system.

I suppose we may conclude the problem is speed related.

Personally I still think there is a problem, as per my own experience.
Do I assume correctly that you consider the problem solved?

In a couple of weeks I will be replacing a few of our servers, these are considerable faster and have multiple processors  then my current test systems, so I will run some tests on these as well.

Edwin,

thank you for the feedback.

Please update us with results of your tests. Please use current version 5.5.28 also

I will conduct further testing shortly after the holiday period.

Dear Sveta,

I could not help myself so I redid the tests as per the last test on 24 Sep 2012 with mysql 5.5.29 on a CentOS 5.7 minimal install with 3 threads active in rund.

Currently after 75:27:15 hours (111 queries per second 30.151.485) no memory leaks are apparent.
Additionally, in the first few seconds of running there appeared to be some memory leaks that I contributed to the threads initialising, but this looks to be more stable now (all threads started = no more memory assigned).
So from my point of view everything looks promising at the moment.

I will give it a few more days and then test the following depending on the results ::
a) Same test with no sleep between queries.
b) Same test with connection to the original remote server.
c) Run the program on the production server to see if the server crashes within 24 hours as this used to be the case when run together with the production software.

Thank you for your effort to follow up my problem.

I am curious as to what has changed in MySQL that might have fixed the problem.

Edwin,

thank you for test results.

I will set status of the report to "Need feedback" and will wait results of all your tests.

Regarding to what can fix the problem. Every new version of MySQL contains improvements in performance and stability including fixes for memory leak bugs. So it is hard to say what exactly fixed this problem, because we could not repeat it on our side and don't know exactly what leaked in your case.

Dear Sveta,

I had 3 tests running for a while :
1) Original test on the slow computer with 3 threads active in rund.
test.001 Ran for 7d22h46m34s without any problems.

2) On a fast system (Dell PowerEdge 1950, Xeon 5050 3Ghz, 2Gb 667 Mhz, HDD 36GB SAS 15000RPM) I ran 2 versions of rund where I modified the logging to limit disk space usage, disabled the lock file and removed the sleep after each query.
Both of these had some memory loss within the first hour, but ran properly afterwards.
test.002 Ran for 3d15h45m38s, 687.077.918 queries = 2.174 Q/s. CPU at 30%
test.003 Ran for 2d15h38m02s, 497.821.846 queries = 2.174 Q/s. CPU at 50%

New tests started :
3) On the slow system rund version from 2) with 5 threads.
test.004 Ran for 5d0h37m04s, 484.516.608 queries = 1.115 Q/s. CPU at 85%
Some memory loss after 19 seconds, further no issues.

4) On the fast system 5 threads connected to the production MySQL server.
test.005 Ran for 5d0h05m57s, 407.307.202 queries = 942 Q/s. CPU at 5%
2 times memory loss within the first hour, no problems later on.

From 2) 3) and 4) I would say there is still an issue, but it has been drastically improved so I have implement this on the production server and will see what happens.

Additionally I have the test started on the production server, before this crashed the system within 24 hours.
Now I had shared memory issues with my production software right after starting the test software on the production server. So after a few hours I stopped rund to avoid further problems with the production unit.
I checked the logs of the last 3 months and they never showed a shared memory problem.

It looks like the problem has been improved upon, however not altogether 100% fixed, the shared memory problems in the production software when starting the test software troubles me.
I will now await how everything progresses during the next month or two on the production unit and see if the failure rate improves or not.

We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not clearly the result of a repeatable/verifiable bug.

I don't see a single piece of evidence that supports the claim of a leak. Perhaps I'm missing it? I see you claim "memory lost", but that is not evidence of an actual *leak*. Memory fragmentation seems the more likely cause.

If you can come up with a clear and repeatable test case that clearly demonstrates a leak, please let me know and I'll be happy to test it.

Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/

Thank you for your interest in MySQL.