Bug #96525 Huge malloc when open file limit is high
Submitted: 13 Aug 21:56 Modified: 14 Aug 13:36
Reporter: Andreas Hasenack Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Compiling Severity:S3 (Non-critical)
Version:5.7 OS:Ubuntu
Assigned to: CPU Architecture:Any

[13 Aug 21:56] Andreas Hasenack
Description:
Hello,

we had the following bug[1] filed in Ubuntu, even though the host was running Arch linux.

With RLIMIT_NOFILE set to a high value, this triggers an exaggerated memory allocation at service startup and causes a 16Gb RAM box to swap or even get allocation errors. The MySQL systemd service file does have a LimitNOFILE=5000 setting, but due to also having PermissionsStartOnly=true, that limit is not applied to the ExecStartPre command:

[Service]
Type=forking
User=mysql
Group=mysql
PIDFile=/run/mysqld/mysqld.pid
PermissionsStartOnly=true
ExecStartPre=/usr/share/mysql/mysql-systemd-start pre
ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/run/mysqld/mysqld.pid
TimeoutSec=600
Restart=on-failure
RuntimeDirectory=mysqld
RuntimeDirectoryMode=755
LimitNOFILE=5000

It turns out Arch linux has a high limit of open files out of the box (RLIMIT_NOFILE, value set to 1073741816) and this triggers an exaggerated memory allocation by mysql. The same happens in MariaDB, and was fixed[2] there. The code in that area is still the same, at a glance.

Ubuntu doesn't trigger this behavior out of the box because our default limit for NOFILE is 1048576.

One could argue that this is a local configuration issue, and/or a linux distribution packaging issue, but it does seem wrong that mysql would allocate that much memory based on the limit of open files.

Troubleshooting was done by others[3], but basically set_max_open_files(max_file_limit) can return the current limit even if it's stupidly large, as long as it's not equal to RLIM_INFINITY:
    if (rlimit.rlim_cur == (rlim_t) RLIM_INFINITY)
      rlimit.rlim_cur = max_file_limit;
    if (rlimit.rlim_cur >= max_file_limit)
      DBUG_RETURN(rlimit.rlim_cur);     /* purecov: inspected */

So if current limit is larger than max_file_limit, but not identical to RLIM_INFINITY, current limit is returned, and later used in a malloc which can become huge.

1. https://bugs.launchpad.net/ubuntu/+source/mysql-5.7/+bug/1839527
2. https://jira.mariadb.org/browse/MDEV-18360
3. https://github.com/systemd/systemd/issues/11510#issuecomment-456999084

How to repeat:
These instructions on ubuntu are going through some hoops to raise the NOFILE limit, as by default ubuntu takes a conservative approach of a value of 1024 for that limit.

Using an Ubuntu 18.04 VM as a base, with 2Gb of RAM, install mysql-server:
sudo apt update
sudo apt install mysql-server -y

Artificially increase the open file limit by editing /lib/systemd/system/mysql.service and replacing the ExecStartPre line with this and commenting the LimitNOFILE line:
ExecStartPre=/bin/sh -c 'ulimit -n 1073741816; /usr/share/mysql/mysql-systemd-start pre'
#LimitNOFILE=5000

Allow the new limit system-wide:
- edit /etc/systemd/system.conf and set:
DefaultLimitNOFILE=1073741816

Issue this command:
sudo systemctl daemon-reload

 
And now restart mysql:
sudo systemctl restart mysql

/var/log/syslog should have something like this:
Aug 13 21:51:48 ubuntu mysqld[8100]: mysqld: Out of memory (Needed 4294967200 bytes)

Suggested fix:
If the current limit is higher than max_file_limit, return max_file_limit.
[14 Aug 13:36] Terje Røsten
Hi!

Thanks for report!

Verified by code inspection.
[14 Aug 13:41] Andreas Hasenack
Thanks for the reply!

Turns out this is not Arch linux specific, but was triggered by an upstream change in systemd 240:

https://github.com/systemd/systemd/commit/a8b627aaed409a15260c25988970c795bf963812

Ubuntu's systemd 240 (in the upcoming Eoan release) has this change from debian, though:
systemd (240-2) unstable; urgency=medium
...
  * Don't bump fs.nr_open in PID 1.
    In v240, systemd bumped fs.nr_open in PID 1 to the highest possible
    value. Processes that are spawned directly by systemd, will have
    RLIMIT_NOFILE be set to 512K (hard).

So even with systemd 240, Debian and Ubuntu are not affected by this.