Bug #27732 Possible memory leak with index_merge
Submitted: 10 Apr 2007 13:18 Modified: 30 Mar 2008 8:34
Reporter: Brice Figureau Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S1 (Critical)
Version:5.0.56,5.1.22 OS:Any (windows, linux)
Assigned to: CPU Architecture:Any
Tags: index_merge, memory leak
Triage: D2 (Serious) / R2 (Low) / E4 (High)

[10 Apr 2007 13:18] Brice Figureau
Description:
I'm running a mysql _only_ server version 5.0.32 (package debian from an older debian sid am64) on a bi-Xeon 64bits with 4GB of physical RAM.

Since its start, the mysqld process VSZ has grown from about 3.4GB (after a few days) to now 4.8GB (more than one month after) and never shrinks. Right now the machine has swapped out portions of this process.

This mysql instance uses about 80% of innodb tables, remaining are MyISAM with a few tables as Archive.

As seen in the attached my.cnf, and per my calculation, this instance should consume about:

innodb_pool_buffer_size + innodb_additional_mem_pool_size 
+ innodb_log_buffer_size + key_buffer_size + query_cache_size 
+ max_connections * (sort_buffer_size+join_buffer_size+read_buffer_size+read_rnd_buffer_size)
= 2800+20+16+64+48+ 100 * (2+1+1+1) = 3448MB.

90% of Connections are short-lived (web based), and about 10 connections are coming from long-term java daemons (game servers).

Baseline usage is about 20 connections at the same time, and we sometimes hit the max_connections setting.

This is not a prepared statement issue as they are not used.
No procedure are used. A few triggers are in use, though.

Since there is no tool to see what happens "inside" mysql, I took a "snapshot" of current (now and one month ago) allocations with pmap. Those are also attached to the bug report.

What is interesting is running a diff between those pmap snapshot to see that nothing has been removed, only new allocations have been added.
Most of the buffers allocated are around 1MB or 2MB.
I don't pretend to know what the glibc is doing internally when mysql allocates something, but those sizes seems really like one of those per thread buffer.

I don't have query log enabled.

I will test latest sid version (5.0.38) ASAP.

How to repeat:
Hum, I suspect this is highly dependent on my environment.

Suggested fix:
It would be ├╝ber-great to have a mean to list the in-use allocated buffers along with their uses ala show mutex.
[10 Apr 2007 13:20] Brice Figureau
mysql my.cnf configuration file

Attachment: my.cnf (application/octet-stream, text), 3.20 KiB.

[10 Apr 2007 13:21] Brice Figureau
pmap snapshot taken a few hours after server start

Attachment: mysql_pmap_20070308.txt (text/plain), 30.45 KiB.

[10 Apr 2007 13:22] Brice Figureau
pmap snapshot taken a few days ago

Attachment: mysql_pmap_20070408.txt (text/plain), 82.39 KiB.

[12 Apr 2007 7:13] Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newer version, 5.0.37 (MySQL binaries, if possible)/5.0.38, and inform about the results. 

In case of the same problem, please, check if it will be possible for you to use TCMalloc library:

http://goog-perftools.sourceforge.net/doc/tcmalloc.html
[13 Apr 2007 22:56] Sergei Golubchik
Brice, if you have a debug build of the server, try "mysqladmin debug" - it'll show memory allocations
[14 Apr 2007 4:56] Shane Bester
Sergei, i see mallinfo() is limited to 4GB.  If so, some results may be obscured in the mysqladmin debug, at least the summary.
[17 Apr 2007 12:48] Brice Figureau
Hi,

I'm running debian sid version of mysql 5.0.38 since a few days.
For the moment, it grows slowly. 

Usually the problem is really noticed about 2 weeks after restart.

I'm preparing a debug build of this version, as I wasn't really successful with the TCMalloc heapchecker (it segfaults at starts).

I tried on a dev machine a debug build, but it doesn't really include lots of information about consumed memory (ie it doesn't seem to account memory used by Innodb ?).

I'll report my findings as soon as I have them.
Thanks for showing your interest to my problem,
[20 Apr 2007 10:01] Valeriy Kravchuk
We appreciate any comments about your findings.
[3 May 2007 9:04] Brice Figureau
Hi,

I'm now running a debug build of MySQL 5.0.38 (based on the debian package) since a few minutes. I'm also running it on top of TCMalloc's Heapcheckers.

The at start Memory info doesn't seem to handle 64Bits environment properly (the numbers seems to have wrapped the signed 32bits limit), but you might be able to understand it so here it is:
May  3 10:57:16 db1 mysqld[28477]: Memory status:
May  3 10:57:16 db1 mysqld[28477]: Non-mmapped space allocated from system: -481226752
May  3 10:57:16 db1 mysqld[28477]: Number of free chunks:^I^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Number of fastbin blocks:^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Number of mmapped regions:^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Space in mmapped regions:^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Maximum total allocated space:^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Space available in freed fastbin blocks: 10502200
May  3 10:57:16 db1 mysqld[28477]: Total allocated space:^I^I^I -777625656
May  3 10:57:16 db1 mysqld[28477]: Total free space:^I^I^I 285896704
May  3 10:57:16 db1 mysqld[28477]: Top-most, releasable space:^I^I 0
May  3 10:57:16 db1 mysqld[28477]: Estimated memory (with thread stack):    -480178176

I also see a lot of "Lock:" in the debug output, if that matters.

I'll post the differences from the debug output, along with the heapchecker's results in a few days when the thing will have sufficiently grown.

If you need more information, let me know.

Thanks,
Brice
[6 Aug 2007 14:14] Brice Figureau
Hi,

I'm running debian's 5.0.45, but still the same issue.

Could it be related to this:
mysql> show mutex status;
+-------------+------+----------+
| File        | Line | OS_waits |
.... followed by tens of:
dict0dict.c     3706    0
dict0dict.c     3706    0
dict0mem.c      90      0
and then
| buf0buf.c   |  494 |        0 | 
| buf0buf.c   |  497 |        0 | 
...
... tons of buf0buf.c  494 0 or buf0buf.c  497 0 lines
| buf0buf.c   |  494 |        0 | 
| buf0buf.c   |  497 |        0 | 
| buf0buf.c   |  494 |        0 | 
| buf0buf.c   |  497 |        0 | 
| buf0buf.c   |  494 |        0 | 
| buf0buf.c   |  545 |   181083 | 
| fil0fil.c   | 1293 |     1564 | 
| srv0start.c | 1201 |        0 | 
| srv0start.c | 1194 |        0 | 
| srv0start.c | 1172 |        0 | 
| dict0mem.c  |   90 |        0 | 
| dict0mem.c  |   90 |        0 | 
| srv0srv.c   |  875 |      272 | 
| srv0srv.c   |  872 |   113867 | 
| thr0loc.c   |  229 |        1 | 
| mem0pool.c  |  205 |     2497 | 
| sync0sync.c | 1289 |        0 | 
+-------------+------+----------+
346809 rows in set (0.27 sec)

So that's more than 346809 locks, and it seems to always grow, never shrink even when the traffic lightens and permits it...

Among those 346809 locks, there are 172800 "buf0buf.c 494" and "127800 buf0buf.c 497". Among those, only a few (less than 80 depending on the line) were OS locks.

I can post the full content of show mutex status if needed.

Thanks,
[26 Dec 2007 12:35] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.0.51, and inform about the results.
[11 Jan 2008 20:16] Ian Wilkes
I am seeing the same behavior with 5.0.45 - a heavily loaded instance is growing by 500MB/day.  I don't see anything in the 5.0.51 release notes which suggests a fix - is there a specific reason to believe that this version will work better?  Has anyone had success at isolating or eliminating this leak?
[12 Jan 2008 14:08] Brice Figureau
Hi Ian,

I have at least found another person hit by this issue on loaded servers. Strangely enough he is also running debian (etch for him). I already ruled out debian mysql version, since the issue appears also with the MySQL compiled version.
I never couldn't reproduce it in the lab, either because I can't replicate the real load my production server is viewing or because I have only 32 bits server in the lab.
I tried to run Google Perftools (the heapchecker), but since it consumes lots of memory I never could run it long enough to capture anything usable (except a memory leak in gethostbyaddr(3)/gethostbyname(3) which after careful analysis is not the problem I see).
I didn't try yet .51, but I don't see anything in the changelog that could let me think something has been fixed in this area.
Also, I find strange that the number of people affected is so low (3 up to now), so that might be because we either have unusual setup (debian, amd64, etc...), or unusual kind of workload. Comparing my own workload with the other person affected didn't reveal anything interesting.
What I'd like to do is to run Google Perftools' heapchecker a longer time and see what it gives. I suggest that if you have the possibility to restart your servers more often than myself you also try to get mysql run over tcmalloc heapchecker and see what comes out of that.
--
Brice
[19 Jan 2008 2:44] James Day
Two leaks to check and verify that they aren't the cause of this for you: bug #21347 a potentially large leak from DROP USER and bug #32041 a small leak when loading and unloading a UDF.
[29 Jan 2008 2:14] Ian Wilkes
A breakthrough - I was able to reproduce the leak using a single, simple query executed repeatedly.  And, I've been able to cause mysqld to shrink back to normal size by doing a FLUSH TABLE table_x, where table_x is the largest, busiest table in the system.  Flushing other tables has no effect on mysqld's size, so I think there is something special about table_x which causes this strange bloat.

I'll try to figure out the smallest set of data needed to reproduce the problem, and post it here.  In the meantime, if you're experiencing this problem, try a FLUSH TABLES and see if that works.

This doesn't appear to be either of the above-listed bugs.  We have shied away from testing 5.0.51 due to some bugs in that version, but will try 5.0.54.
[29 Jan 2008 3:23] Shane Bester
Ian, please upload the table structure and query. You can do it in a private note if needed.  Thanks,
[4 Feb 2008 16:36] Brice Figureau
Hi Ian,

I just did a flush table on every tables to see if that changes anything and frankly it didn't had any visible effects (but I restarted mysql about a week ago, so it doesn't take lots of memory yet, I'll redo the test in a week or two).

Can you disclose (only to me if you prefer) the query and the table format, so that I can compare with my tables and usage patterns?

Thanks,
Brice
[5 Feb 2008 2:45] Ian Wilkes
It's possible we're dealing with different bugs.  The query in question looks something like this:

SELECT *many columns*, UNIX_TIMESTAMP(u) AS timestamp
FROM table_x
WHERE a=*uuid* AND c=*other uuid*
ORDER BY timestamp DESC

The table is pretty much what you'd expect - the only interesting thing about it is that there are separate keys on columns a and c, so when you EXPLAIN the query you get "Using intersect(a,c)".  That's about all the detail I'd like to give at this level.
[5 Feb 2008 14:01] Shane Bester
this is the testcase I wrote.  leaks 20kb/s on my machine.

Attachment: bug27732_testcase.sql (application/unknown, text), 5.57 KiB.

[5 Feb 2008 18:03] Shane Bester
verified using my testcase attached as a stored procedure.  please run it until a few hundred thousand iterations, and watch memory increase gradually.

setting triage values:

D2: resources leaks (memory increases infinitely)
W3: partial workaround is to flush tables
I2: i think any query using index_merge is affected. key(a),key(b)..select .. where a=1 and b=5
[6 Feb 2008 23:10] Sergey Petrunya
Using Shane's last testcase, can repeat on debug build of 5.0-opt.

If I edit the opt_range.cc source to disable index_merge/intersection, memleak goes away.
[7 Feb 2008 2:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41838

ChangeSet@1.2599, 2008-02-07 05:10:04+03:00, sergefp@mysql.com +1 -0
  BUG#27732 "Possible memory leak with index_merge"
  The bug was that handler::clone/handler::ha_open() call caused allocation of 
  cloned_copy->ref on the handler->table->mem_root. The allocated memory could not 
  be reclaimed until the table is flushed, so it was possible to exhaust memory by 
  repeatedly running index_merge queries without doing table flushes.  
  
  The fix:
  - make handler::clone() allocate new_handler->ref on the passed mem_root 
  - make handler::ha_open() not allocate this->ref if it has already been allocated
  There is no testcase as it is not possible to check small leaks from testsuite.
[13 Mar 2008 19:28] Bugs System
Pushed into 6.0.5-alpha
[13 Mar 2008 19:36] Bugs System
Pushed into 5.1.24-rc
[13 Mar 2008 19:43] Bugs System
Pushed into 5.0.60
[30 Mar 2008 8:34] Jon Stephens
Documented bugfix in the 5.0.60, 5.1.23-ndb-6.3.11, 5.1.24, and 6.0.5 changelogs as follows:

        It was possible to exhaust memory by repeatedly running index_merge
        queries and never performing any FLUSH TABLES.
[26 Feb 2009 15:57] Mark Callaghan
You did not add a regression test for this. This bug has caused me a lot of debugging effort. I am sure others have spent too much time on this as well. You owe us a test in mysql-test-run.

I will attach a file that provides a test. However, this assumes that SHOW STATUS exports the value Bytes_allocated which tracks #bytes currently allocated for all MEM_ROOT instances. It is up to you to implement that counter -- it isn't much work and can use thread_safe_increment.

And when you have Bytes_allocated in SHOW_STATUS, you can add checks to many other tests to find memory leaks.
[26 Feb 2009 15:58] Mark Callaghan
Regression test that assume SHOW STATUS exports Bytes_allocated (from MEM_ROOT)

Attachment: b.test (application/octet-stream, text), 8.11 KiB.

[14 Jan 2011 13:42] Mattias Jonsson
I will try to add the test in the scope of bug#59316, which affected partitioning...