Bug #62849 | Memory leak using SELECT statements in a multi-threaded application. | ||
---|---|---|---|
Submitted: | 20 Oct 2011 16:59 | Modified: | 24 Jan 2013 18:20 |
Reporter: | Edwin Vandamme | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: C API (client library) | Severity: | S2 (Serious) |
Version: | 5.0.45, 5.0.77, 5.5.16 | OS: | Linux |
Assigned to: | CPU Architecture: | Any | |
Tags: | memory leak, multi threaded |
[20 Oct 2011 16:59]
Edwin Vandamme
[20 Oct 2011 17:02]
Edwin Vandamme
Code and tools to reproduce the problem
Attachment: mysql.error.report.tar.gz (application/x-gzip, text), 10.43 KiB.
[22 Oct 2011 17:58]
Edwin Vandamme
Some logs from a recent test
Attachment: mysql.error.report.logs.tar.gz (application/x-gzip, text), 3.84 KiB.
[30 Oct 2011 15:46]
Edwin Vandamme
From further testing with additional logging on the line prior to mysql_query and the line after mysql_query. The problem seems to be located in mysql_query.
[31 Oct 2011 6:13]
MySQL Verification Team
Hi Edwin, I'm starting to examine this bug. First thing I noticed is in your Makefile: /usr/bin/mysql_config --include --libs Please, use the multithreaded client libs! /usr/bin/mysql_config --include --libs_r
[31 Oct 2011 7:05]
MySQL Verification Team
After minor tweaks, I built the example code, but running it doesn't seem to do much. Appears to be dependent on your environment, paths, database schema etc. If memory leaks after recompiling with threadsafe client libs, I suggest you try using google heap profiler to find out where the allocations are. Something like this: HEAP_PROFILE_ALLOCATION_INTERVAL=100000 HEAP_PROFILE_INUSE_INTERVAL=100000 HEAPPROFILE=mybin.hprof LD_PRELOAD=/usr/local/lib/libtcmalloc.so ./application pprof --pdf ./application ./mybin.hprof Full info here: http://google-perftools.googlecode.com/svn/trunk/doc/heapprofile.html
[1 Nov 2011 10:46]
Edwin Vandamme
Dear Shane, Thank you for having a look at my problem. 1) I was already aware of having to use --libs_r, so I have run both (I seem to have send the incorrect makefile). To make 100% sure I will rerun the code with --libs_r My production code uses --libs_r for sure and is showing the same problem. 2) Yes, as stated in the original report, everything should be in /Programs/functions/test.mysql/ unless you search/replace that string in all code. 3) In the mean time I have modified writeLog to use localtime_r instead of localtime, althoug as this is in a mutex it should make no difference. I have also checked all code for other non thread-safe commands (as far as I know of) and I am using no others. 4)I will have a look at the profiler tools and keep you informed.
[1 Nov 2011 11:48]
Edwin Vandamme
After installing rpm -ivh google-perftools-1.8-1.i386.rpm I get the following error when launching the rund test program "ERROR: ld.so: object '/usr/lib/libtcmalloc.so' from LD_PRELOAD cannot be preloaded: ignored."
[1 Nov 2011 13:01]
Edwin Vandamme
I had to use /usr/lib/libtcmalloc.so.0 instead of /usr/lib/libtcmalloc.so Profiling is running at the moment, awaiting results.
[3 Nov 2011 21:06]
Edwin Vandamme
Using the profiler everything runs very slow, simular to running valgrind. As a result the problem only occured once during a 3 day run. [2011/11/01 13:07:14] rund : 22765568 : ==> : 002 : doQ () [2011/11/01 13:07:14] rund : 22765568 : === : 002 : doQ () - 001 [2011/11/01 13:07:14] rund : 22831104 : === : 002 : doQ () - 004 [2011/11/01 13:07:14] rund : 22831104 : <== : 002 : doQ () I modified the functions doQ and writeLog by moving the pthread_mutex_lock to the first line of the function so that all logs are nicely grouped per thread. The test ran using --libs_r There was only 1 heap file created upon deamonizing by the profiler.
[5 Nov 2011 9:45]
Edwin Vandamme
I did another run without the profiler using --libs_r and a modified rund.cc and the problem still occurred. Included following files (mysql.error.report.002.tar.gz) : Makefile.proc_log1 Makefile to compile proc_log1.cc _memory1.log Processed rund.mem.log log. _memory2.log Processed rund.log.* logs. rund.log.001 rund log file. rund.log.002 rund log file. rund.mem.log rund cronjob log file. rund.log.x.001 Error situation copied from the log files for eas of usage. Makefile.rund Makefile to compile rund.cc using --libs_r. rund Executable rund from compiling rund.cc. rund.cc Modified rund.cc. rund_prof Bash file to start rund with Google profiler.
[5 Nov 2011 9:47]
Edwin Vandamme
Logs and code as mentioned above (minus log files as they are to big)
Attachment: mysql.error.report.002.tar.gz (application/x-gzip, text), 27.63 KiB.
[6 Nov 2011 13:35]
Edwin Vandamme
I have found the other heap files (0001-0266). Although I only got 1 memory leak at the start of the test (in my own logging during the test), I find the following rather strange : I am not familiar with the Google profiler so I am not sure on how to interprete the results, but a negative value stikes me as odd. pprof --text --base=rund_.0001.heap rund rund_.0265.heap Using local file rund. Using local file rund_.0265.heap. Total: 0.0 MB 0.0 100.0% 100.0% 0.0 100.0% my_malloc 0.0 0.0% 100.0% 0.0 100.0% __clone 0.0 0.0% 100.0% -0.0 -4372.3% cli_read_query_result 0.0 0.0% 100.0% 0.0 100.0% doQ 0.0 0.0% 100.0% 0.0 100.0% handleISDN 0.0 0.0% 100.0% -0.0 -4372.3% mysql_query 0.0 0.0% 100.0% -0.0 -4372.3% mysql_real_query 0.0 0.0% 100.0% 0.0 4472.3% mysql_store_result 0.0 0.0% 100.0% 0.0 100.0% start_thread
[16 Nov 2011 7:01]
Edwin Vandamme
I now did 2 runs with valgrind in order to try and prove my point. The first as follows (mysql.error.report.003.tar.gz) : valgrind --leak-check=full --trace-children=yes --log-file=vg_rund.txt /Programs/functions/test.mysql/rund The second as follows (mysql.error.report.004.tar.gz) : valgrind --leak-check=full --show-reachable=yes --trace-children=yes --log-file=vg_rund.txt /Programs/functions/test.mysql/rund There are instances where memory is lost after days of running without anny problems, then suddenly 31 leaks within a timeframe of an hour. (2th run after 7 days) As the program is supposed to run 24/7 you can imagine that this causes problems in the long run. The weird situation is that when using valgrind the memory leaks are always 16384 bytes, while without valgrind the values are much higher and not constant. Unfortunately, valgrind does not seem to report where the memory is eventually leaked.
[16 Nov 2011 7:01]
Edwin Vandamme
1st run with valgrind as mentioned above.
Attachment: mysql.error.report.003.tar.gz (application/x-gzip, text), 194.31 KiB.
[16 Nov 2011 7:02]
Edwin Vandamme
2th run with valgrind as mentioned above.
Attachment: mysql.error.report.004.tar.gz (application/x-gzip, text), 290.09 KiB.
[29 Nov 2011 11:59]
Edwin Vandamme
Ran rund for a while with helgrind without anny problems. Possibly due to even slower running then with valgrind, the memory issue did not occur.
[15 Dec 2011 15:44]
Edwin Vandamme
Anny updates ?
[17 Feb 2012 16:58]
MySQL Verification Team
I have been studied a bit your comments and it seems that phenomena that you observe depend a lot on the speed. Hence, most of the tools available will not help, as they will slow down things a lot. I would, therefore, suggest that you use SAFEMALLOC, that is optionally linked into C API (and also in the server). But, SAFEMALLOC does not exist any more in version 5.5. Hence, you should use 5.0 or 5.1. In order to use it, you should first configure entire source tree with --with-debug-full configure option. You should then link your application with new libmysqlclient library, preferably a static one. That should be enough. You can run your test program then and it should not be slower then 15 to 20 % then the original. There should not be any trace file (watch /tmp/ subdir for any), but if there is a trace file, we can then disallow it with -debug or -# option. If there are any error messages, they will show up in stderr, so redirect the output from stderr to some file. Let us know if this worked for you and what are the results. Thanks in advance.
[18 Mar 2012 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[27 May 2012 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[24 Sep 2012 11:20]
Edwin Vandamme
In the hope that someone will take this seriously I would like to report some results after a couple of months testing, trying to come up with some tangible proof (a valgrind report vg_rund.txt) as to the memory leak that prompted the initial bug report : 1) Running the program with mysql_query commented did not show any memory leaks, even after 5+ days running. 2) Running the program with mysql_query active results in a memory leak within 48 hours. 3) Running the program with mysql_real_query instead of mysql_query results in a memory leak after 5 days or more. From this I would conclude that : a) There is a memory leak in either 1 of the following mysql commands : mysql_query, mysql_store_result, mysql_fetch_row. b) Most likely the problem is related to mysql_query as the leak does not occur as fast with mysql_real_query. Final results where obtained on mysql 5.5.27 on a CentOS 5.7 minimal install with no network connection on a somewhat faster computer as originally used and with only 3 threads active instead of 5. The following files are included in mysql.error.report.005.tar.gz /test.mysql/mysql.install/mysql.5.5.27.install Installation procedure used to install mysql. /test.mysql/mysql.install/my.conf mysql configuration file used. /test.mysql/mysql.install/_structure.sql File to initialise mysql data. /test.mysql/mysql.install/mysql.ini.sql File to initialise mysql data. /test.mysql/mysql.install/ndb_check.sql File to initialise mysql data. /test.mysql/_memory2.log Extracted memory changes from the log files. /test.mysql/rund.cc The source code of the program used to test (makefile as previously posted). /test.mysql/vg_rund Bash file to start the program with valgrind /test.mysql/vg_rund.txt The results of the last run. I am no valgrind expert so I am not 100% sure if the results are meaningful, at least I see mysql_query show up in the log quite a number of times. For some reason I cannot upload the file, but I will send it using the ftp option.
[24 Sep 2012 11:21]
Edwin Vandamme
Last test results with valgrind errors reported.
Attachment: mysql.error.report.005.tar.gz (application/x-gzip, text), 26.62 KiB.
[24 Sep 2012 12:35]
MySQL Verification Team
Hi Edwin, So I'm running the rund program now under gperftools and there is no leak (running at 500 queries per second). I checked the database, and only a single connection is running queries. It seems all the program does is select from the table `callsystem`.`scripts`? Given that, I inserted 1 row into the table so the query has a resultset. "INSERT INTO `callsystem`.`scripts` set IServer=0,Unit=0,Script='HIJGEN',node='2002';" If all the program does is a simple select, I'd suggest just trying mysqlslap and see if it leaks instead. Next, the valgrind output proves only one thing, you have a bug in your shutdown handler code where it reads and writes to freed memory. No line numbers shown, since you probably didn't compile with -g.
[24 Sep 2012 13:31]
MySQL Verification Team
whoops, does this program leak memory ?
Attachment: bug62849.c (text/plain), 5.85 KiB.
[24 Sep 2012 20:09]
Edwin Vandamme
Thanks for running the program, but to see any result you should run it somewhere between 24 and 48 hours. Obviously this is a cut down version from the production program to illustrate the problem (I actually kept cutting it down until I got to this part). When you look at the log file it creates it logs the amount of memory used as well, which is my base to detect memory loss without using valgrind as valgrind really slows down everything and it also serialises the threads. All threads use the same connection which is protected by a mutex to avoid simultaneous usage, this to avoid tons of connections from the production application (which runs minimal 120 threads). Typically in production every thread would only perform a few actions every minute so a single mysql connection does not slow down the application. Now from 1 of my runs I included _memory2.log which is an extract from all log files where memory is being lost during a run (Otherwise you get tons of useless data which is hard to dig trough, but you can always look at the start of a log file to check if memory usage has changed between log files). You will notice that after about 7 hours there is a sudden increase of lost memory, which shouldn’t happen as the program does the same thing over and over again. As I stated before, it might take up to 48 hours before the problem arises and when I remove mysql_query from the program there is no memory loss. The program proc_log2.cc to process the logs in search of memory issues was included in an earlier post. Now in the production application this results in a program crash about once a month. When I also run the test application the system crashes within 24 hours. I have been looking into this problem for the past 2 years, before I arrived at a mysql problem, so I am : a) either really stupid (a possibility). b) keep looking over something obvious (you probably know how that goes, i will bang my head against the wall if you point me to the problem) c) absolutely right. Now I normally don’t report problems, this is actually the first time in 25 years, but I am at a complete loss and the fact that this is the only problem I have never been able to fix bothers the hell out of me. I will have a look at the bug62849.c tomorrow morning (it’s 22:00 here a.t.m.)
[25 Sep 2012 8:08]
Edwin Vandamme
I started bug62849.c and it has been running for 2 hours now with stable memory usage as reported by it's /proc/PID/stat file. I will keep it going for at least 48 hours and see if I have memory problems. In the mean time I will look at your code and see if there is something I missed in my own code.
[26 Sep 2012 6:28]
Edwin Vandamme
To avoid temptation to modify bug62849.c, I did not look at the code right away. As it stopped suddenly, I noticed you had a timer set to just under 17 hours. I have modified this to 72 hours and restarted the test.
[29 Sep 2012 11:17]
Edwin Vandamme
Shane, After running your test program for 72 hours (369.332.476 queries), some memory leaks where apparent from the /proc/PID/stat file. The values that changed in the stat file are : unsigned long utime; // user-mode CPU time accumulated by process unsigned long stime; // kernel-mode CPU time accumulated by process unsigned long vsize; // number of pages of virtual memory ... long rss; // resident set size from /proc/#/stat Included you will find stat.tot which has 3 lines each from a different time during the program run. I mistakenly looked at the wrong column during the test so I only saw the problem at the very end. Line 1 is just after the start of the program 20120926 11:55 Line 2 is about a day later 20120927 08:24 memory is same as line 1 Line 3 is 1 minute before the end of the test 20120929 11:55 memory is increased by 1048576. So can you explain to me why there is no increase between Line 1 and 2 but an increase between Line 2 and 3 ? Anyhow, I assume you have been running my test program for 72 hours and have analysed the log files with proc_log2, so I am curious as to your results.
[29 Sep 2012 11:18]
Edwin Vandamme
bug62849 results after 72 hours
Attachment: stat.tot (application/octet-stream, text), 624 bytes.
[29 Sep 2012 12:00]
MySQL Verification Team
I wouldn't call it a leak yet. But I also am not familiar enough with Linux memory management to explain the changes fully: number of minor page faults 467,473,483 Size of the process' virtual memory space (in bytes) 58261504,58261504,59310080 Size of the process' resident set (in bytes) 394,400,410 It could just be net_buffer_length expanding to accommodate for a larger result. You said: "72 hours (369.332.476 queries)" Since I'm doing 2000 queries per second with a static 1 row dataset, I must assume you have larger data. Does the data change? Does the resultset change? I got a feel you are running libmysqld embedded server. Is that what you are using in your app, or is it always libmysql client? I will upload a google heap profiler output after my app hits 400 million queries, so we can inspect what exactly allocates memory.
[30 Sep 2012 16:34]
Edwin Vandamme
Shane, At 2000 queries/sec you should get about 518.400.000 queries in 72 hours. My test hardware is a 5 year old desktop that I have been using for this test, so that might be the reason I get fewer queries. I use the data set as generated by my installation procedure. (Just to make sure we use the same data) I modified your program to extract the memory usage from the stat file, that is how programs like “top” do it. It’s attached as bug62849_1.c, it is a little crude but you might want to adjust as to your needs. Remember that when I did the tests with my test program without mysql_query I did not see any memory changes over a 10 day period. “It could just be net_buffer_length expanding to accommodate for a larger result” No as it is always the same query with the same result. I always use libmysql client as per my test program.
[30 Sep 2012 16:35]
Edwin Vandamme
Modified bug62849.c to read memory usage from stat file.
Attachment: bug62849_1.c (application/octet-stream, text), 10.85 KiB.
[3 Oct 2012 5:24]
MySQL Verification Team
5.5.27 client libs, 480 million queries, no leaking seen.
Attachment: no_leaks.txt (text/plain), 9.16 KiB.
[4 Oct 2012 9:15]
Edwin Vandamme
Shane, Thank you for running the test on your system. I suppose we may conclude the problem is speed related. Personally I still think there is a problem, as per my own experience. Do I assume correctly that you consider the problem solved? In a couple of weeks I will be replacing a few of our servers, these are considerable faster and have multiple processors then my current test systems, so I will run some tests on these as well.
[18 Dec 2012 20:00]
Sveta Smirnova
Edwin, thank you for the feedback. Please update us with results of your tests. Please use current version 5.5.28 also
[24 Dec 2012 9:37]
Edwin Vandamme
I will conduct further testing shortly after the holiday period.
[27 Dec 2012 15:18]
Edwin Vandamme
Dear Sveta, I could not help myself so I redid the tests as per the last test on 24 Sep 2012 with mysql 5.5.29 on a CentOS 5.7 minimal install with 3 threads active in rund. Currently after 75:27:15 hours (111 queries per second 30.151.485) no memory leaks are apparent. Additionally, in the first few seconds of running there appeared to be some memory leaks that I contributed to the threads initialising, but this looks to be more stable now (all threads started = no more memory assigned). So from my point of view everything looks promising at the moment. I will give it a few more days and then test the following depending on the results :: a) Same test with no sleep between queries. b) Same test with connection to the original remote server. c) Run the program on the production server to see if the server crashes within 24 hours as this used to be the case when run together with the production software. Thank you for your effort to follow up my problem. I am curious as to what has changed in MySQL that might have fixed the problem.
[27 Dec 2012 17:27]
Sveta Smirnova
Edwin, thank you for test results. I will set status of the report to "Need feedback" and will wait results of all your tests. Regarding to what can fix the problem. Every new version of MySQL contains improvements in performance and stability including fixes for memory leak bugs. So it is hard to say what exactly fixed this problem, because we could not repeat it on our side and don't know exactly what leaked in your case.
[8 Jan 2013 12:34]
Edwin Vandamme
Dear Sveta, I had 3 tests running for a while : 1) Original test on the slow computer with 3 threads active in rund. test.001 Ran for 7d22h46m34s without any problems. 2) On a fast system (Dell PowerEdge 1950, Xeon 5050 3Ghz, 2Gb 667 Mhz, HDD 36GB SAS 15000RPM) I ran 2 versions of rund where I modified the logging to limit disk space usage, disabled the lock file and removed the sleep after each query. Both of these had some memory loss within the first hour, but ran properly afterwards. test.002 Ran for 3d15h45m38s, 687.077.918 queries = 2.174 Q/s. CPU at 30% test.003 Ran for 2d15h38m02s, 497.821.846 queries = 2.174 Q/s. CPU at 50% New tests started : 3) On the slow system rund version from 2) with 5 threads. test.004 Ran for 5d0h37m04s, 484.516.608 queries = 1.115 Q/s. CPU at 85% Some memory loss after 19 seconds, further no issues. 4) On the fast system 5 threads connected to the production MySQL server. test.005 Ran for 5d0h05m57s, 407.307.202 queries = 942 Q/s. CPU at 5% 2 times memory loss within the first hour, no problems later on. From 2) 3) and 4) I would say there is still an issue, but it has been drastically improved so I have implement this on the production server and will see what happens. Additionally I have the test started on the production server, before this crashed the system within 24 hours. Now I had shared memory issues with my production software right after starting the test software on the production server. So after a few hours I stopped rund to avoid further problems with the production unit. I checked the logs of the last 3 months and they never showed a shared memory problem. It looks like the problem has been improved upon, however not altogether 100% fixed, the shared memory problems in the production software when starting the test software troubles me. I will now await how everything progresses during the next month or two on the production unit and see if the failure rate improves or not.
[24 Jan 2013 18:20]
Matthew Lord
We're sorry, but the bug system is not the appropriate forum for asking help on using MySQL products. Your problem is not clearly the result of a repeatable/verifiable bug. I don't see a single piece of evidence that supports the claim of a leak. Perhaps I'm missing it? I see you claim "memory lost", but that is not evidence of an actual *leak*. Memory fragmentation seems the more likely cause. If you can come up with a clear and repeatable test case that clearly demonstrates a leak, please let me know and I'll be happy to test it. Support on using our products is available both free in our forums at http://forums.mysql.com/ and for a reasonable fee direct from our skilled support engineers at http://www.mysql.com/support/ Thank you for your interest in MySQL.