Bug #84003 MySQL Consumes All Available System Memory || OS Swapping
Submitted: 29 Nov 2016 12:54 Modified: 26 Jan 2017 13:24
Reporter: N.S. Cutler Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.7.17 OS:CentOS
Assigned to: MySQL Verification Team CPU Architecture:Any
Tags: load, Memory, swap

[29 Nov 2016 12:54] N.S. Cutler
Description:
On a stripped down VM dedicated to running MySQL, under minimal load mysqld gradually consumes all available system memory (2GB), forcing the OS to swap.

SYSTEM
cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core)

uname -a
Linux MySQL-Prod1 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

ps aux | grep mysqld
mysql     7055  0.2 81.9 2856948 1543916 ?     Sl   Nov18  37:03 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid

CONFIG
sudo cat /etc/my.cnf
[mysqld]
user=mysql
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
bind-address=lan-ip
sql-mode="STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

symbolic-links=0
skip_name_resolve=1
skip-show-database
skip-ssl
skip-character-set-client-handshake
character-set-server=utf8
collation-server=utf8_general_ci
local-infile=0
explicit_defaults_for_timestamp=1
ignore-db-dir=lost+found

innodb_buffer_pool_size=536870912
query_cache_size=0

log-error=/var/log/mysqld.log

How to repeat:
Start server, observe virtual memory increase beyond physical memory in server; OS starts swapping, increasing at a rate of 10% per day. Within a week OOME killer should "restart" mysqld.

Suggested fix:
There should be a way to cap mysqld resource usage. 

Identical companion VMs running Apache and JVM processes stay within a reasonable upper bound with respect to memory usage (i.e. given the minimal load each VM has 300-500MB free memory).

I would expect for mysqld to follow suit. Ideas appreciated for how to reign in MySQL memory usage.

Thanks.
[30 Nov 2016 11:30] MySQL Verification Team
Hi,

What type of traffic are you running on the mysql server?

The mysql binary you are using - did you get it from Oracle or ?

best regards
Bogdan Kecman
[30 Nov 2016 12:27] N.S. Cutler
> What type of traffic are you running on the mysql server?

minimal, since the server went online (Nov 24) the total number of queries: 1210597. Server load peaks at .4 and is generally <= .05

> The mysql binary you are using - did you get it from Oracle or ?

Direct from from mysql.com yum repository (https://dev.mysql.com/downloads/repo/yum/) for RHEL 7.

More Details:

We migrated from a 7 year old physical server running CentOS 6 VMs to a new server running CentOS 7 VMs. A single application (java service) accesses the database via JDBC based connection pool with a max of 12 connections.

With CentOS 6 on MySQL 5.6, a single vCPU, 1.5GB memory, and 256MB buffer pool, mysqld memory usage is @50% of available system memory with 0 swap space used. With CentOS 7 on MySQL 5.7, 2X vCPU, 2GB memory, and 512MB buffer pool, mysqld memory usage is @90% of available system memory with 250M swap space used.

Since the working set fits in 256MB I'm going to look into resizing the buffer pool from current 512MB. Otherwise, perhaps there's some kind of multiplier effect with respect to number of cpu cores? That and a bit more memory are the only resource difference between CentOS 6 and CentOS 7 VMs.

It's fine that mysqld uses most of available system memory, but given the server load and small working set, it should *not* force the OS to swap.

In local 5.7 install reducing table_definition_cache to 400 and disabling performance_schema have been the only config options that have shown any reduction in mysqld memory footprint, likely with a performance hit that would be nice to avoid.
[30 Nov 2016 12:42] MySQL Verification Team
Hi,

Will look more into this, just one more question, are you running maybe often some ddl's on your database (alter table, drop table, create table..) or you setup the db structure and you are only now modifying/selecting data only?

thanks
Bogdan Kecman
[30 Nov 2016 22:56] N.S. Cutler
Thanks.

re: structural changes, imported dump on Nov. 24 and has been read/write since then. 97%/3% read/write if it helps.
[4 Dec 2016 11:50] N.S. Cutler
Resized buffer pool to 256MB and restarted mysqld; after a couple of days running, same effect, OS swapping -- given the server load it looks like 5.7.16 leaks memory :\

No idea what mysqld is doing with the memory but it's *all* consumed by this process (virtual memory already at 3GB, physical memory exhausted, and OS eating up 10% swap space per day).

For now scheduled reboots is the workaround, far from ideal...
[7 Dec 2016 3:35] MySQL Verification Team
Hi,
was running 5.7.16 in vm for few days with same config as yours without seeing any issues :(

can you get me the output of
SHOW GLOBAL VARIABLES\G
SHOW GLOBAL STATUS\G

Thanks
Bogdan
[8 Dec 2016 13:52] N.S. Cutler
global vars

Attachment: global-vars.txt (text/plain), 60.30 KiB.

[8 Dec 2016 13:52] N.S. Cutler
global status

Attachment: global-status.txt (text/plain), 41.21 KiB.

[8 Dec 2016 13:56] N.S. Cutler
Depends, has your instance been sitting idle or under steady sustained use? I suspect the former as with an identical spare server that is mostly idle memory usage remains constant and does not grow without bound.

Please see global variables and status files attached.

mysqld process has been running for 26 hours. Based on prior experience swapping will begin shortly and continue at a rate of 10% swap space per day.

Let me know if the variables/status files shed any light on the situation, would love to get this resolved!

Thanks
[12 Dec 2016 13:05] MySQL Verification Team
Hi,

if you server is running since last status&variables fetching them again would give us very usable delta result to work with.

if not then we'll deal with what we have. So far I don't see anything wrong.

thanks
Bogdan
[14 Dec 2016 2:38] N.S. Cutler
global vars2

Attachment: global-vars2.txt (text/plain), 14.21 KiB.

[14 Dec 2016 2:38] N.S. Cutler
global status2

Attachment: global-status2.txt (text/plain), 8.40 KiB.

[14 Dec 2016 2:45] N.S. Cutler
Can only get through about 48 hours before having to restart due to swap usage. 

See the latest status and variables attached, maybe there's a clue in there as to what's causing the higher than expected memory usage.

One thing I might not have mentioned is that on the client side we use a connection pool. Connections are allocated on application startup and live for just under MySQL default timeout of 8 hours. It's not uncommon for there to be 2-4K prepared statements in total spread across active threads. I'd think the connection overhead would be quite small but thought I'd put that out there (though this was not an issue on 5.6)
[3 Jan 2017 22:45] Timothy Northrup
post-startup global variables

Attachment: tgn-global-vars.txt (text/plain), 59.45 KiB.

[3 Jan 2017 22:47] Timothy Northrup
Seeing the same problem on an Ubuntu server w/ 15 GB RAM (no swap), and MySQL configured for 4 GB InnoDB buffer pool. Quickly consumes all available memory, then killed due to OOM.

uname -a
Linux ip-10-109-165-5 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

attaching SHOW GLOBAL VARIABLES / STATUS results following startup, and then when running task ramps up memory use
[3 Jan 2017 22:48] Timothy Northrup
post-startup global status

Attachment: tgn-global-status.txt (text/plain), 40.70 KiB.

[3 Jan 2017 22:49] Timothy Northrup
global vars after app running - virtual memory up to 13GB+

Attachment: tgn-global-vars3.txt (text/plain), 59.45 KiB.

[3 Jan 2017 22:50] Timothy Northrup
global status after app running - virtual memory up to 13GB+

Attachment: tgn-global-status3.txt (text/plain), 40.75 KiB.

[4 Jan 2017 6:11] Timothy Northrup
A followup - after adding a swapfile to the server and re-running the problem task, mysqld memory consumption went above 18 GB virtual / 14 GB resident and then settled back down to 6.5 GB virtual / 4.5 GB resident and continued as normal. So does not appear to be an actual "leak" but a temporary massive consumption of memory.

I checked the temp table / sort buffer configurations - things seem within appropriate bounds for the size of the server.
[4 Jan 2017 6:40] MySQL Verification Team
Hi,

I can't reproduce that. Don't see nothing weird in the status/variables. You yourself see that it settles back normally so it's some procedure/query you have there that's consuming resources, but obviously this is not a bug. 

In order to optimize your procedures, queries, see what from your app is eating up resources and how to sort that out you can use the support services (offered by both Oracle and 3rd parties)

all best
Bogdan
[17 Jan 2017 18:13] N.S. Cutler
Just a followup, solution was to double the RAM in the VM; now swapping no longer occurs.

While technically this is not a bug, it's quite surprising to see a 2X increase in memory usage consumed between 5.6 and 5.7
[19 Jan 2017 13:53] N.S. Cutler
Re-opening, had a spike in traffic yesterday -- back to swapping again. Keep in mind that in 9 days running since doubling available memory in the VM, on average, MySQL is handling all of 3.5 queries per second.

The problem in 5.7 is this: once MySQL consumes memory it *never* releases it back to the OS!

As mentioned before there is no leak on the client (JVM app does not swap and easily handles the load with plenty of free memory). On the server (MySQL) however, there is either a leak, or by design MySQL 5.7 simply reserves memory without regard to underlying system resources or load.

If MySQL does actually release memory to the OS then the only thing I can think of is that there is a prepared statement leak on the server. For example, yesterday during the traffic spike there were 8K prepared statements held in memory; today there are about 300. While the overhead of 8K * 2048 bytes (max statement size) is minimal, if on each cache/release cycle memory is not released back to the OS then this steady one-way-street to OOME has a cause.

Regardless, something is badly broken. At this point quadrupling available memory would likely lead to swapping....
[24 Jan 2017 13:15] MySQL Verification Team
Hi,

I can't reproduce this and I'm running bunch of 5.7 inside VM for testing for months .. it might be a bug but it might be that you are just using this much ram.

What does this select returns on your system (available from here https://dev.mysql.com/doc/refman/5.7/en/memory-use.html and you can find more, stored procedures, shell scripts etc there to calc max ram usage expected ):

SELECT
( @@key_buffer_size + @@query_cache_size + @@tmp_table_size
+ @innodb_buffer_pool_size + @innodb_additional_mem_pool_size
+ @innodb_log_buffer_size
+ @@max_connections * (
@@read_buffer_size + @@read_rnd_buffer_size + @@sort_buffer_size
+ @@join_buffer_size + @@binlog_cache_size + @thread_stack
) ) / @giga_bytes AS MAX_MEMORY_GB;

you can use SYS to monitor usage and see where the ram is ( https://github.com/mysql/mysql-sys )

just make sure all memory instrumentation is enabled. To see what's "not enabled":
 CALL sys.ps_setup_show_disabled_instruments();

to enable "everything":
 CALL sys.ps_setup_enable_instrument('');

this will give you some summary:
 SELECT * FROM sys.host_summary\G
 SELECT * FROM sys.memory_global_by_current_bytes;
 SELECT * FROM sys.memory_by_user_by_current_bytes;
 SELECT * FROM sys.memory_by_thread_by_current_bytes;
 SELECT * FROM sys.memory_by_host_by_current_bytes;
[26 Jan 2017 13:24] N.S. Cutler
After digging around some more I finally found the cause: Linux' Transparent Huge Pages (THP) combined with system memory allocator (doesn't matter which, malloc, jemalloc, etc.) cause mysqld RSS to increase without bound.

It may be due to our long lived connections, thousands of prepared statements on those connections, and the inability of memory allocator to clean up huge pages in a timely manner.

Regardless, disabling THP allows the allocator to eventually return memory to the OS. Basically, MySQL memory usage now falls within a normal range given system load.

Not a bug! Thanks for your patience :)

p.s. interesting that on other VMs (Apache and JVM) with THP enabled no such memory usage issues occur.