Bug #56634 Large Page Memory not working
Submitted: 8 Sep 2010 3:56 Modified: 26 Oct 2010 21:24
Reporter: Andrig Miller Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.5.6-rc OS:Linux (Fedora 13 x86_64)
Assigned to: John Russell CPU Architecture:Any

[8 Sep 2010 3:56] Andrig Miller
Description:
I was doing testing on 5.5.5-m3, and I realized that InnoDB was not using my large page memory for the buffer pool.  I saw a bug that had a typo in sys_varc.cc related to large page memory configuration.  I assumed that this bug fixed the actual issue with large page memory, but after pulling down from bazaar the trunk, and doing a local build, and installing that, also verifying that the sys_vars.cc fix was there, InnoDB still does not create its buffer pool using the large page memory.

I have verified my configuration, and I can use large page memory from my JVM, so I know things are configured correctly on the OS side of things.  In fact, I can test with 5.1.48, and it works as expected with the latest 5.1 code.

So, it appears that large page memory support for the InnoDB buffer pool is not working.

How to repeat:
Configure large page memory in Linux.

Add the large-pages = true to my.cnf

Start the database.

HugePages_Rsvd: 0 <- this should be non-zero if it worked.

Suggested fix:
I'm not that familiar with the code, and the digging that I have done, I cannot find an obvious issue.  The code is all there for using large page memory, and looks correct to me.
[8 Sep 2010 3:57] Andrig Miller
Added that this is an x86_64 system (Intel Nehalem based).
[8 Sep 2010 5:52] Andrig Miller
Okay, I believe I have figured out the problem.  UNIV_LINUX is not being defined.  So, the #ifdef HAVE_LARGE_PAGES && UNIV_LINUX that surrounds the code for doing the proper shmget call to allocate large pages is compiled away.

I see in the flags.make file generated by cmake the following:

# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 2.8

# compile C with /usr/bin/gcc
# compile CXX with /usr/bin/c++
C_FLAGS =  -Wall -O2 -g -DDBUG_OFF -I/home/andrig/Build/mysql-server/bld/include -I/home/andrig/Build/mysql-server/storage/innobase/include -I/home/andrig/Build/mysql-server/storage/innobase/handler -I/home/andrig/Build/mysql-server/include -I/home/andrig/Build/mysql-server/sql -I/home/andrig/Build/mysql-server/regex -I/home/andrig/Build/mysql-server/zlib   -DUNIV_LINUX -D_GNU_SOURCE=1

C_DEFINES = -DHAVE_CONFIG_H -DHAVE_IB_GCC_ATOMIC_BUILTINS=1 -DHAVE_IB_ATOMIC_PTHREAD_T_GCC=1 -DSIZEOF_PTHREAD_T=8

CXX_FLAGS =  -Wall -Wno-unused-parameter -fno-implicit-templates -fno-exceptions -fno-rtti -O2 -g -DDBUG_OFF -I/home/andrig/Build/mysql-server/bld/include -I/home/andrig/Build/mysql-server/storage/innobase/include -I/home/andrig/Build/mysql-server/storage/innobase/handler -I/home/andrig/Build/mysql-server/include -I/home/andrig/Build/mysql-server/sql -I/home/andrig/Build/mysql-server/regex -I/home/andrig/Build/mysql-server/zlib   -DUNIV_LINUX -D_GNU_SOURCE=1

CXX_DEFINES = -DHAVE_CONFIG_H -DHAVE_IB_GCC_ATOMIC_BUILTINS=1 -DHAVE_IB_ATOMIC_PTHREAD_T_GCC=1 -DSIZEOF_PTHREAD_T=8

So, I need the -DUNIV_LINUX to be -DUNIV_LINUX=1, but for the life of my I cannot figure out how to get that to change from cmake.

Any pointers in the right direction, would be very much appreciated.
[8 Sep 2010 14:20] Andrig Miller
Okay, I'm an idiot.  My C is very rusty.  UNIV_LINUX just has to be defined, it doesn't need to be a specific value.

So, that's not the problem.

The HAVE_LARGE_PAGES does seem to be missing though, but its seems it should be getting defined, in that cmake says that the SHM_HUGETLB is found, which I thought would get HAVE_LARGE_PAGES defined.
[9 Sep 2010 19:00] Vladislav Vaintroub
Andrig, I checked, HAVE_LARGE_PAGES is defined (at least in my case).
it is in include/config.h and this part of innobase source  does actually compile (you can check if you add #error foo line in this #ifdef'ed part).

The CMake code to check HAVE_SHM_HUGETLB and set HAVE_LARGE_PAGES is in cmake/os/Linux.cmake, and this will expand to 1 if defined due to
#cmakedefine HAVE_LARGE_PAGES in config.h.cmake

So, the defines should all be there. Alas, there seems to other stuff required for large pages to run. I have not researched where os_use_large_pages and os_large_page_size come from, but they both need to be not 0. And even if everything is defined correctly, if OS fails allocate large pages at runtime, it will fallback to normal allocation, though innodb would write a warning.

Both os_use_large_pages and os_large_page_size have to be not 0 (I did not research where they, and even if they are not null, large page still can fail to allocate,
[9 Sep 2010 19:02] Vladislav Vaintroub
ignore last paragraph in the previous comment, I intended to remove it, but submitted too fast:)
[10 Sep 2010 19:24] Andrig Miller
In looking at this for another whole day, I also cannot see why it wouldn't work.  In the top level configure.in, there are three flags that need to be defined, HAVE_LARGE_PAGES, HAVE_LARGE_PAGE_OPTIONS and HUGETLB_USE_PROC_MEMINFO.  From looking at the cmake output, when I do cmake.. the check for SHM_HUGETLB succeeds, so those flags should be set.

These are the lines from configure.in

# For large pages support
if test "$TARGET_LINUX" = "true"
then
  # For SHM_HUGETLB on Linux
  AC_CHECK_DECLS(SHM_HUGETLB,
      AC_DEFINE([HAVE_LARGE_PAGES], [1],
                [Define if you have large pages support])
      AC_DEFINE([HAVE_LARGE_PAGE_OPTION], [1],
                [Define if you have large page option])
      AC_DEFINE([HUGETLB_USE_PROC_MEMINFO], [1],
                [Define if /proc/meminfo shows the huge page size (Linux only)])
      , ,
      [
#include <sys/shm.h>
      ]
  )

The other thing that needs to be defined is UNIV_LINUX, which is being defined based on this (in the storage/innobase directory):

CMakeLists.txt

# OS tests
IF(UNIX)
  IF(CMAKE_SYSTEM_NAME STREQUAL "Linux")
    CHECK_INCLUDE_FILES (libaio.h HAVE_LIBAIO_H)
    CHECK_LIBRARY_EXISTS(aio io_queue_init "" HAVE_LIBAIO)
    ADD_DEFINITIONS("-DUNIV_LINUX -D_GNU_SOURCE=1")

So, this test does succeed because I see this in the flags.make in the bld/storage/innobase/CMakeFiles/innobase.dir:

# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 2.8

# compile C with /usr/bin/gcc
# compile CXX with /usr/bin/c++
C_FLAGS =  -Wall -O2 -g -DDBUG_OFF -I/home/andrig/Build/mysql-server/bld/include -I/home/andrig/Build/mysql-server/storage/innobase/include -I/home/andrig/Build/mysql-server/storage/innobase/handler -I/home/andrig/Build/mysql-server/include -I/home/andrig/Build/mysql-server/sql -I/home/andrig/Build/mysql-server/regex -I/home/andrig/Build/mysql-server/zlib   -DUNIV_LINUX -D_GNU_SOURCE=1

C_DEFINES = -DHAVE_CONFIG_H -DHAVE_IB_GCC_ATOMIC_BUILTINS=1 -DHAVE_IB_ATOMIC_PTHREAD_T_GCC=1 -DSIZEOF_PTHREAD_T=8

CXX_FLAGS =  -Wall -Wno-unused-parameter -fno-implicit-templates -fno-exceptions -fno-rtti -O2 -g -DDBUG_OFF -I/home/andrig/Build/mysql-server/bld/include -I/home/andrig/Build/mysql-server/storage/innobase/include -I/home/andrig/Build/mysql-server/storage/innobase/handler -I/home/andrig/Build/mysql-server/include -I/home/andrig/Build/mysql-server/sql -I/home/andrig/Build/mysql-server/regex -I/home/andrig/Build/mysql-server/zlib   -DUNIV_LINUX -D_GNU_SOURCE=1

CXX_DEFINES = -DHAVE_CONFIG_H -DHAVE_IB_GCC_ATOMIC_BUILTINS=1 -DHAVE_IB_ATOMIC_PTHREAD_T_GCC=1 -DSIZEOF_PTHREAD_T=8

What I thought I would see here though is the combination of the UNIV_LINUX and _GNU_SOURCE along with the HAVE_LARGE_PAGES, HAVE_LARGE_PAGE_OPTIONS, and HUGETLB_USE_PROC_MEMINFO.

But, alas I don't.

In terms of it failing, even if its built correctly, InnoDB is not logging a warning, so if it is failing, its failing silently.  This server, I have used extensively with MySQL in the past, and always with large page memory.  It has 16 GB of large page memory set aside after boot.  I know this is working just fine.  There is either something wrong with the build, or some non-obvious code problem somewhere causing it to fail.
[10 Sep 2010 19:29] Andrig Miller
strace file output

Attachment: stracemysql.out (application/octet-stream, text), 410.35 KiB.

[10 Sep 2010 19:31] Andrig Miller
I attached a capture of strace from mysql starting up, and it doesn't show a single shmget call, which I would expect.  I'm not sure it tells us anything, but for what its worth.
[21 Sep 2010 21:32] Andrig Miller
I changed the version to the latest community official release, instead of the one I built from source, as I did discover this in 5.5.5-m3, and have confirmed that Large Page Memory (HugeTLB) is not working in this release, and since its an RC, hopefully this can get fixed, before its finalized.
[30 Sep 2010 15:41] Inaam Rana
I just tried this with mysql-trunk-innodb. Here is how I build and run:

cmake -DCMAKE_INSTALL_PREFIX:PATH=/home/inaam/install -DWITH_INNOBASE_STORAGE_ENGINE:BOOL=ON

in .my.cnf add:

large_pages

bling:/proc/sys/vm# uname -a
Linux bling 2.6.26-2-686 #1 SMP Wed May 12 21:56:10 UTC 2010 i686 GNU/Linux
bling:/proc/sys/vm# 
bling:/proc/sys/vm# cat /proc/meminfo | grep Huge
HugePages_Total:   356
HugePages_Free:    356
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     4096 kB
bling:/proc/sys/vm# 

When I start mysqld in gdb I see following:
Breakpoint 1, os_mem_alloc_large (n=0x8ca1f60)
    at /home/inaam/w/mysql-trunk-innodb/storage/innobase/os/os0proc.c:80
80              if (!os_use_large_pages || !os_large_page_size) {
(gdb) p os_large_page_size
$17 = 4194304
(gdb) p/x os_large_page_size
$18 = 0x400000
(gdb) p os_use_large_pages 
$19 = 1
...
(gdb) next
89              shmid = shmget(IPC_PRIVATE, (size_t)size, SHM_HUGETLB | SHM_R | 
SHM_W);
(gdb) p/x size
$22 = 0x4400000
(gdb) p/x n
$23 = 0x8ca1f60

In the error log I see the following message:
100930 11:34:23  InnoDB: Initializing buffer pool, size = 64.0M
InnoDB: HugeTLB: Warning: Failed to allocate 71303168 bytes. errno 12
InnoDB HugeTLB: Warning: Using conventional memory pool

So what I see is that the code is executed. Why it fails is perhaps because I have added /proc/sys/vm/nr_hugepages on the fly and there is not enough memory in the system for large pages or may be I need some kind of permissions to use large pages. In any case, the bug that the large page allocation is not triggered at all is not reproducible for me. Please explain the exact steps taken to sensitize this bug.
[30 Sep 2010 16:21] Inaam Rana
I was able to get the same message on an x86_64 system from latest mysql-trunk:

irana@dscczz02:~/install$ uname -a
Linux dscczz02.us.oracle.com 2.6.18-164.0.0.0.1.el5 #1 SMP Thu Sep 3 00:21:28 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
irana@dscczz02:~/install$ 

InnoDB: HugeTLB: Warning: Failed to allocate 69206016 bytes. errno 12
InnoDB HugeTLB: Warning: Using conventional memory pool
[30 Sep 2010 17:24] Inaam Rana
After doing the basic setup for hugepages I was able to do the allocation and start the mysqld with large page support.

root@bling:/proc/sys/vm# cat /proc/meminfo | grep Huge
HugePages_Total:   709
HugePages_Free:    708
HugePages_Rsvd:     31
HugePages_Surp:      0
Hugepagesize:     4096 kB

I am going to set this to can't repeat for now. If you have exact steps required to reproduce this please feel free to reopen.
[4 Oct 2010 16:33] Vasil Dimov
Hello Andrig,

to eliminate the issues with the defined or not macros, please apply this patch and try to compile:

--- storage/innobase/os/os0proc.c	revid:vasil.dimov@oracle.com-20101004115320-eryimp7m94fyssph
+++ storage/innobase/os/os0proc.c	2010-10-04 16:31:42 +0000
@@ -71,12 +71,13 @@ os_mem_alloc_large(
 /*===============*/
 	ulint*	n)			/*!< in/out: number of bytes */
 {
 	void*	ptr;
 	ulint	size;
 #if defined HAVE_LARGE_PAGES && defined UNIV_LINUX
+#error 123
 	int shmid;
 	struct shmid_ds buf;

If you get #error 123 when compiling then all necessary macros are defined, if not then some of them is not defined.

Thanks!
[11 Oct 2010 15:54] Andrig Miller
Okay, I re-checked out the code, and added the #error 123, and it does get the compile error you would expect as follows:

/home/andrig/Build/mysql-server/storage/innobase/os/os0proc.c:77:2: error: #error 123
make[2]: *** [storage/innobase/CMakeFiles/innobase.dir/os/os0proc.c.o] Error 1
make[1]: *** [storage/innobase/CMakeFiles/innobase.dir/all] Error 2
make: *** [all] Error 2

That's a good sign.  The release build, with the generic RPM for 5.5.6-rc does not work though.  I will build it without the #error 123, and see if it works from the source build though, and update this again with the results.
[11 Oct 2010 16:17] Andrig Miller
Okay, so I found the actual problem.  I was able to successfully get large page memory working, even with the release RPM build of 5.5.6-rc.

If I edited the mysql init script, and added --large_pages directly to the command-line such as mysqld_safe --large_pages, then it works.

If I remove that and count on the large pages coming from the my.cnf file, which works fine with MySQL 5.1.xx, it doesn't work.

This is the line I have in my.cnf:

large-pages = true

So, it appears the code, and build are just fine, but that the problem lies in the reading, parsing, and setting of the configuration from the my.cnf file.
[11 Oct 2010 16:22] Andrig Miller
Okay, I'm closing this, as it is simply that the option in my.cnf has changed from large-pages = true to large-pages without anything else.

I changed the option in my.cnf from large-pages = true to large-pages, and it works as expected.

This should be documented somewhere, as the old format doesn't work, so anyone carrying over their my.cnf from an older release may have this problem.
[11 Oct 2010 16:26] Calvin Sun
Thanks, Andrig! I have changed it to Documenting to make sure it is documented.
[11 Oct 2010 17:14] Andrig Miller
Your very welcome.  It's my pleasure to help out where I can.
[26 Oct 2010 4:45] Paul DuBois
http://dev.mysql.com/doc/refman/5.1/en/large-page-support.html says this:

"
Large page support in MySQL is disabled by default. To enable it, start the server with the --large-pages option. For example, you can use the following lines in your server's my.cnf file:

[mysqld]
large-pages
"

Not:

[mysqld]
large-pages = true
[26 Oct 2010 21:24] John Russell
The documentation reflects the correct syntax already. Don't see an opportunity to help people avoid such an error, other than in the place that already shows the no-argument form of the option.