MySQL Bugs: #57480: Memory leak when having 256+ tables

Bug #57480	Memory leak when having 256+ tables
Submitted:	15 Oct 2010 11:19	Modified:	5 Jan 2012 1:22
Reporter:	Alexander Rubin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S2 (Serious)
Version:	5.1, 5.0	OS:	Any
Assigned to:	Vasil Dimov	CPU Architecture:	Any
Tags:	memory leak, partitioning

Description:
Memory leak when having 256+ partitions, partitioning by key.
Table:
CREATE TABLE `sbtest` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY KEY (id)

Table size: 10M rows
InnoDB_buffer_pool = 4G

test:
for i in {1..400}; do echo "select count(id) from sbtest;" | mysql -h 127.0.0.1 -uroot -A sbtest &  done

(this can be any SQL query that affects all partitions)

Watch top and see that RAM used increasing significantly.
Flush tables does NOT release memory.

Without partitioning RAM does not grow

How to repeat:
Create table with 256 partitions by key, 2G+ in size, run the test above and watch top (linux) or prstat (solaris). 

Suggested fix:
Fix the probable memory leak in partitioning.

Could it be related to bug#20877?

Test to create 256+ tables instead and access them and see if you get the same result.

Yes, it is growing with 256 tables, without partitioning.
Test case with 256 tables:

# create 256 tables like the original one and load 40K rows in each
for i in {1..256}; do echo "drop table if exists sbtest$i; create table sbtest$i like sbtest; insert into sbtest$i select * from sbtest limit 40000" | /data1/arubin/mysql51/bin/mysql -vvv -h 127.0.0.1 -uroot -P3309 -A sbtest; done
sql="select count(id) from sbtest1 as s$i"

# generate SQL with union
for i in {2..256}; do sql="$sql union select id from sbtest$i as s$i"; done
echo $sql

# run count(*) on top of union

for i in {1..256}; do echo "select count(*) from ($sql) as a"  | /data1/arubin/mysql51/bin/mysql -h 127.0.0.1 -uroot -P3309 -A sbtest &  done

Regarding InnoDB dict:

Total memory allocated 4391436288; in additional pool allocated 0
Dictionary memory allocated 1291612
Buffer pool size   262143
Free buffers       4
Database pages     262136
Old database pages 96745
Modified db pages  0
Pending reads 0

So, it is only allocated 1M

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

This happens with 256 tables, not related specifically to partitions.

Here are the top snapshots during the select count(*) execution.
This is 256 tables, not partitions

Before:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 4717m 423m 5720 S    0  5.4  27:39.78 mysqld

During execution:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 5162m 877m 5812 S  319 11.2  27:51.83 mysqld

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 6208m 1.8g 5812 S  397 23.5  34:16.32 mysqld

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 6258m 1.9g 5812 S  396 24.4  43:02.82 mysqld

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 6490m 2.2g 5824 S    0 29.4  95:32.10 mysqld

mysql> flush tables;
Query OK, 0 rows affected (0.01 sec)

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9232 arubin    16   0 6490m 2.2g 5836 S    0 29.4  95:32.20 mysqld

So, flush tables does not release memory

hpof with mysqldebug results during the script execution

pprof  /data1/arubin/mysql51/bin/mysqld /tmp/mysql.hprof_6659.0010.heap
Total: 5228.6 MB
  4723.1  90.3%  90.3%   4723.1  90.3% dict_create
   505.3   9.7% 100.0%    505.3   9.7% handle_options
     0.1   0.0% 100.0%      0.1   0.0% __GI__dl_allocate_tls

pprof  /data1/arubin/mysql51/bin/mysqld /tmp/mysql.hprof_6659.0019.heap

Total: 6121.8 MB
  4833.2  78.9%  78.9%   4833.2  78.9% dict_create
  1288.5  21.0% 100.0%   1288.5  21.0% handle_options
     0.1   0.0% 100.0%      0.1   0.0% __GI__dl_allocate_tls
     0.1   0.0% 100.0%      0.1   0.0% my_well_formed_len_8bit

MySQL 5.5:

Total: 1910.9 MB
  1362.8  71.3%  71.3%   1362.8  71.3% vtable for Item_sum_udf_str
   323.0  16.9%  88.2%    323.0  16.9% update_auto_increment_key
   216.9  11.4%  99.6%    216.9  11.4% row_ins
     4.1   0.2%  99.8%      4.1   0.2% vtable for Item_func_char_length

mysqladmin debug status:

Memory status:
Non-mmapped space allocated from system: -2097778688
Number of free chunks:                   0
Number of fastbin blocks:                0
Number of mmapped regions:               0
Space in mmapped regions:                0
Maximum total allocated space:           0
Space available in freed fastbin blocks: 8937488
Total allocated space:                   313540592
Total free space:                        1874710528
Top-most, releasable space:              0
Estimated memory (with thread stack):    -2097778688

But memory is still growing:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25844 arubin    15   0 6966m 4.5g 6304 S  334 59.2 141:26.56 mysqld-debug

MySQL 5.5 after 2 times script run:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25844 arubin    15   0 7033m 4.6g 6304 S    0 60.2 273:47.58 mysqld-debug

not obvious that this is InnoDB and not partitioning in general.
What would help, is pmap and pstack, or the equivalent, output for the mysqld process, every 2-5 seconds,  when you are see the memory leak.

I still fail to see any bug here at all.  The testcase is highly concurrent.
Can you determine if the problem is:

1) number of tables in the test
or
2) number of threads running concurrently
or
3) combination of the threads + tables.

I have a database with over 35,000 partitions across 120 databases, and I am seeing a memory leak since 5.5.8, and including 5.5.14. Please see issue #61829

Very extensive work has been done on this, not yet ready for release though. Among other things the work has included:

1. Moving many allocations to the heap instead of malloc.
2. Merging multiple allocations into one in several cases.
3. Better estimates of sizes needed for some allocations.
4. Defer allocation of upd_buff, of the size of the data row, that's only needed if a row is updated.

One correction to my last post:

1. Moving many allocations to the stack instead of malloc.

Just a suggestion: The Redis NoSQL in-memory database uses jemalloc on Linux (from their README file):

"Redis is compiled and linked against libc
malloc by default, with the exception of jemalloc being the default on Linux
systems. This default was picked because jemalloc has proven to have fewer
fragmentation problems than libc malloc."

If you don't want to recompile mysql with jemalloc, you can probably compile jemalloc as a shared library, and then use the LD_PRELOAD environment variable, so it gets loaded before the libc malloc implementation.

Adding to changelog:

Memory allocation for InnoDB tables was reorganized to reduce the
memory overhead for large numbers of tables or partitions, avoiding
situations where the "resident set size" could grow regardless of
FLUSH TABLES statements. The problem was most evident for tables with
large row size. Some of the memory that was formerly allocated for
every open table is now allocated only when the table is modified for
the first time.

There are two Transactions on InnoDB blog entries discussing the memory allocation improvements prompted by this report:

http://blogs.innodb.com/wp/2011/12/improving-innodb-memory-usage/
http://blogs.innodb.com/wp/2011/12/improving-innodb-memory-usage-continued/

While the big improvement in reduced fragmentation comes with the Linux memory allocator the general improvement in efficiency and deferring of some memory allocations until they are needed should help other operating systems.

The fix has been backported to 5.5. Will be in 5.5.21 release. Thanks!

The fix for this bug introduced bug #70641 "5.6 partitions use much more memory than 5.1".