Bug #94747 | 4GB Limit on large_pages shared memory set-up | ||
---|---|---|---|
Submitted: | 22 Mar 2019 10:06 | Modified: | 27 Mar 2019 16:00 |
Reporter: | Nikolai Ikhalainen | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
Version: | 5.7.25 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[22 Mar 2019 10:06]
Nikolai Ikhalainen
[22 Mar 2019 14:13]
MySQL Verification Team
Hi, Thank you for your report. Your report is very sparse on details. Just enabling large pages in MySQL is not enough. Please read the following page and confirm that you have done EVERYTHING that is recommended there: https://dev.mysql.com/doc/refman/8.0/en/large-page-support.html
[22 Mar 2019 15:59]
Nikolai Ikhalainen
Hi Sinisa, The problem could be reproduced on different hosts. CentOS 7.6 with kernel 4.17.4-1.el7.elrepo.x86_64 Limits for shm are even bigger: cat /proc/sys/kernel/shmmax /proc/sys/kernel/shmall 18446744073692774399 18446744073692774399 large_pages allocation works fine with --innodb_buffer_pool_chunk_size=1G, --innodb_buffer_pool_chunk_size=2G and --innodb_buffer_pool_chunk_size=3G (correctly listed in ipcs -m and /proc/meminfo) 8.0.15 shows the same issue: # bin/mysqld --no-defaults --user=root --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=2 --large-pages --innodb_buffer_pool_chunk_size=2G --innodb_buffer_pool_size=8G # ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 zabbix 600 1825056 6 dest 0x00000000 5013505 root 600 2199912448 1 dest 0x00000000 5046274 root 600 2199912448 1 dest 0x00000000 5079043 root 600 2199912448 1 dest 0x00000000 5111812 root 600 2199912448 1 dest # bin/mysqld --no-defaults --user=root --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=2 --large-pages --innodb_buffer_pool_chunk_size=4G --innodb_buffer_pool_size=8G ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 zabbix 600 1825056 6 dest 0x00000000 5144577 root 600 102760448 1 dest 0x00000000 5177346 root 600 102760448 1 dest The problem is also easy to reproduce on Ubuntu 18.04 4.15.0-46-generic (20GB RAM): as root: sync; echo 3 > /proc/sys/vm/drop_caches echo 5120 > /proc/sys/vm/nr_hugepages # 10GB ulimit -l unlimited wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.15-linux-glibc2.12-x86_64.tar.xz tar xaf mysql-8.0.15-linux-glibc2.12-x86_64.tar.xz mv mysql-8.0.15-linux-glibc2.12-x86_64 m80 cd m80 bin/mysqld --no-defaults --user=root --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --initialize-insecure --skip-networking bin/mysqld --no-defaults --user=root --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=2 --large-pages --innodb_buffer_pool_chunk_size=2G --innodb_buffer_pool_size=8G # innodb_buffer_pool_chunk_size=2G allocates memory correctly: ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 42598402 root 600 2199912448 1 dest 0x00000000 42631172 root 600 2199912448 1 dest 0x00000000 42663943 root 600 2199912448 1 dest 0x00000000 42696712 root 600 2199912448 1 dest # The issue happens with 4G: bin/mysqld --no-defaults --user=root --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=2 --large-pages --innodb_buffer_pool_chunk_size=4G --innodb_buffer_pool_size=8G ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 42729474 root 600 102760448 1 dest 0x00000000 42762244 root 600 102760448 1 dest Instead of two 8GB segments mysql creates two 98MB segments. shm limits are the same as on CentOS: cat /proc/sys/kernel/shmmax /proc/sys/kernel/shmall 18446744073692774399 18446744073692774399
[26 Mar 2019 13:50]
MySQL Verification Team
Hi, First of all, you did not answer the question in my previous comment. Second, this seems to be strictly LInux OS internal issue with pages and not related to MySQL, at all !!!!!! Please, prove otherwise.
[27 Mar 2019 3:35]
Nikolai Ikhalainen
Hi Sinisa, I've followed https://dev.mysql.com/doc/refman/8.0/en/large-page-support.html . Large pages support is working if innodb_buffer_pool_chunk_size is not producing shared memory segments large than 4GB. I'm also able to create shared memory segments on linux large than 4GB: // gcc test_large_pages.c -o test_large_pages #include <stdlib.h> #include <sys/types.h> #include <sys/shm.h> int main() { int shmid; struct shmid_ds buf; void* ptr; //shmid = shmget(IPC_PRIVATE, 1073741824UL, SHM_HUGETLB | SHM_R | SHM_W); //shmid = shmget(IPC_PRIVATE, 8589934592UL, SHM_HUGETLB | SHM_R | SHM_W); shmid = shmget(IPC_PRIVATE, 8692695040UL, SHM_HUGETLB | SHM_R | SHM_W); ptr = shmat(shmid, NULL, 0); shmctl(shmid, IPC_RMID, &buf); sleep(30); return 0; } $ ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 6029313 nickolay.i 600 8692695040 1 dest If I check mysql with strace: strace -o large_pages_2G.strace.log bin/mysqld --no-defaults --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=1 --large-pages --innodb_buffer_pool_chunk_size=2G --innodb_buffer_pool_size=8G strace -o large_pages.strace.log bin/mysqld --no-defaults --datadir=$PWD/data --lc-messages-dir=$PWD/share/english --socket=$PWD/data/mysqld.sock --skip-networking --innodb_buffer_pool_instances=1 --large-pages --innodb_buffer_pool_chunk_size=4G --innodb_buffer_pool_size=8G strace -o large_pages_test.strace.log ../test_large_pages $ grep shmget large_pages*log large_pages_2G.strace.log:shmget(IPC_PRIVATE, 8388608, SHM_HUGETLB|0600) = 5865473 large_pages_2G.strace.log:shmget(IPC_PRIVATE, 2199912448, SHM_HUGETLB|0600) = 5898242 large_pages_2G.strace.log:shmget(IPC_PRIVATE, 2199912448, SHM_HUGETLB|0600) = 5931011 large_pages_2G.strace.log:shmget(IPC_PRIVATE, 2199912448, SHM_HUGETLB|0600) = 5963780 large_pages_2G.strace.log:shmget(IPC_PRIVATE, 2199912448, SHM_HUGETLB|0600) = 5996549 large_pages.strace.log:shmget(IPC_PRIVATE, 8388608, SHM_HUGETLB|0600) = 5767169 large_pages.strace.log:shmget(IPC_PRIVATE, 102760448, SHM_HUGETLB|0600) = 5799938 large_pages.strace.log:shmget(IPC_PRIVATE, 102760448, SHM_HUGETLB|0600) = 5832707 large_pages_test.strace.log:shmget(IPC_PRIVATE, 8692695040, SHM_HUGETLB|0600) = 6062081 As you can see mysqld is trying to allocate 102760448 bytes. At the same time Linux 4.17.4-1.el7.elrepo.x86_64 accepts 8GB+98MB shm size value for my standalone program. If I set a breakpoint at shmget call: https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/os/os0proc.cc#L91 (gdb) p *n $4 = 4397727744 (gdb) p size $5 = 102760448 size is calculated as: size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size); Round is defined as: #define ut_2pow_round(n, m) ((n) & ~((m) - 1)) (gdb) p (( *n + (os_large_page_size - 1) ) & ~(( os_large_page_size ) - 1)) $6 = 102760448 The result became correct after converting ~((m) - 1) to uint64_t: (gdb) p (( *n + (os_large_page_size - 1) ) & ~(( os_large_page_size ) - 1UL)) $7 = 4397727744
[27 Mar 2019 13:34]
MySQL Verification Team
Hi Nikolai, Thank you for your insight. That actually means that our documentation is not complete and that we should obligatory add that chunk size should be smaller than 4 Gb. Verified as a documentation bug.
[27 Mar 2019 16:00]
Nikolai Ikhalainen
Hi Sinisa, The same macro is used in a different place inside innodb, in buf_chunk_init function: https://github.com/mysql/mysql-server/blob/5.7/storage/innobase/buf/buf0buf.cc#L1498 mem_size = ut_2pow_round(mem_size, UNIV_PAGE_SIZE); It works correctly, because UNIV_PAGE_SIZE is intentionally defined in storage/innobase/include/univ.i as a 64 bit unsigned integer: #define UNIV_PAGE_SIZE ((ulint) srv_page_size) The behavior with 4GB limitation at mysqld side for large pages looks not intentional, especially because in https://bugs.mysql.com/bug.php?id=43606 exactly the same issue (it's not possible to create large memory segments bigger than 4GB) was marked as a defect and the issue was fixed. Nowadays systems with several TB of RAM are not something unusual and artificial limitation at 4GB (actually just 3GB could be used, because BP allocation is slightly bigger than chunk-size) forcing mysql users to create many chunks with large-pages (e.g. 1300 chunks for 4TB BP). At the same time the manual suggests to keep the number of chunks less than 1000 ( see https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-resize.html )
[28 Mar 2019 13:20]
MySQL Verification Team
Thank you. Your request will be taken into consideration when this bug is processed internally. Hence, I have left a comment in the internal bug, which can consider your feature request as well.
[4 Apr 2019 19:55]
Mark Callaghan
I don't understand how this is a doc bug. AFAIK the code does the wrong thing and changing it to do the right thing doesn't seem like a major effort.
[5 Apr 2019 12:46]
MySQL Verification Team
HI Mark, I agree. I am changing a category of this bug.
[8 Apr 2019 2:19]
Daniel Black
Was broken in https://github.com/mysql/mysql-server/commit/e5d9961b637f871b34d7741b9f3db336c59ddec4 when os_large_page_size changed from ulint -> uint. Changing the type back also fixes it (this is 8.0.14): diff --git a/storage/innobase/include/os0proc.h b/storage/innobase/include/os0proc.h index 13633bb12d3..8a12f58e68f 100644 --- a/storage/innobase/include/os0proc.h +++ b/storage/innobase/include/os0proc.h @@ -52,7 +52,7 @@ extern ulint os_total_large_mem_allocated; extern bool os_use_large_pages; /** Large page size. This may be a boot-time option on some platforms */ -extern uint os_large_page_size; +extern ulint os_large_page_size; /** Converts the current process id to a number. @return process id as a number */ diff --git a/storage/innobase/os/os0proc.cc b/storage/innobase/os/os0proc.cc index ed466028590..9e1a3d2aea1 100644 --- a/storage/innobase/os/os0proc.cc +++ b/storage/innobase/os/os0proc.cc @@ -58,7 +58,7 @@ ulint os_total_large_mem_allocated = 0; bool os_use_large_pages; /** Large page size. This may be a boot-time option on some platforms */ -uint os_large_page_size; +ulint os_large_page_size; /** Converts the current process id to a number. @return process id as a number */ $ runtime_output_directory/mysqld --version /home/dan/repos/build-mysql-8.0/runtime_output_directory/mysqld Ver 8.0.15 for Linux on x86_64 (Source distribution) gdb --args ./runtime_output_directory/mysqld --no-defaults --datadir=/tmp/mysqldata --innodb_buffer_pool_instances=1 --large-pages --innodb_buffer_pool_chunk_size=4G --innodb_buffer_pool_size=8G (gdb) break os_mem_alloc_large(unsigned long*) Breakpoint 1 at 0x1e54a20: file /home/dan/repos/mysql-server/storage/innobase/os/os0proc.cc, line 83. (gdb) run (gdb) p *n $1 = 4397727744 (gdb) p os_large_page_size $2 = 2097152 (gdb) n 89 size = ut_2pow_round(*n + (os_large_page_size - 1), os_large_page_size); (gdb) 91 shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | SHM_W); (gdb) p size $3 = 4397727744
[8 Apr 2019 12:59]
MySQL Verification Team
Hi Daniel, Thank you very much to your contribution. I have added your comment to our internal bug database.
[18 Jul 2019 13:03]
MySQL Verification Team
This bug has a duplicate bug in the following one: https://bugs.mysql.com/bug.php?id=96197