Bug #56358 My cluster used so many swap even if they have enough memory.
Submitted: 30 Aug 2010 9:40 Modified: 2 May 2012 11:53
Reporter: Sean Lee Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S5 (Performance)
Version:5.1.44-ndb-7.1.4b-cluster-gpl OS:Linux (ubuntu 8.04 64bit 2.6.24-24-server edition)
Assigned to: CPU Architecture:Any
Tags: Memory, swap, SwapCached

[30 Aug 2010 9:40] Sean Lee
Description:
I have delpoyed a mysql cluster on my 3 servers in production
environment. The two ndb node servers have 8G RAM a 4-cores CPU. After
running some days. I find the ndb servers used almost 7G swap, but the
vmstat output's so values is almost always 0. I then have checked the
/proc/meminfo output, the SwapCached value is almost 7G.

I try to swapoff then swapon on the sever, I find the swap space
increasing to 7G slowly. As I understand the SwapCached means the
pages is in both swap and memory. So this means I have enough memory?
And I can not reproduct the issue on my development test environment
with the same data and the same web application. The only differences
are the development environment use 2-core cpus and the
MaxNoOfExecutionThreads value is 2, and the production servers use
RAID 1+0. I don't think theses make the issue.

The top output is :
top - 20:37:21 up 14 days,  4:03,  1 user,  load average: =.51, 0.51, 0.50
Tasks:  84 total,   1 running,  83 sleeping, &nb=p; 0 stopped,   0 zombie
Cpu(s):  0.3%us,  0.1%sy,  0.0%ni, 99.2%id,  0.2%wa,&=bsp; 0.0%hi,
0.2%si,  0.0%st
Mem:   8191248k total,  8143524k used,    4=724k free,    54972k buffers
Swap: 15623096k total,  7450752k used,  8172344k free, &nb=p; 549788k cached

  PID USER      PR  NI  VIRT = RES  SHR S %CPU %MEM    TIME+  COMMAND
         =
 8171 root      20   0 6953m 6.7g=6256 S    2 85.9 279:09.76 ndbmtd
       &=bsp;
 4883 mysql     20   0  604m 378m=nbsp; 10m S    0  4.7  23:59.94
mysqld           &=bsp;

The proc/meminfo output is:
MemTotal:      8191248 kB
MemFree:         46632 kB
Buffers:         56692 kB
Cached:         549812 kB
SwapCached:    7421452 kB
Active:        4002424 kB
Inactive:      4032764 kB
SwapTotal:    15623096 kB
SwapFree:      8172344 kB
Dirty:           &=bsp;   0 kB
Writeback:          68 kB Mapped:          20992 kB SReclaimable:    18456 kB
SUnreclaim:      30536 kB
PageTables:      16888 kB
NFS_Unstable:        0 kB
Bounce:           =nbsp;  0 kB
CommitLimit:  19718720 kB
Committed_AS:  7482056 kB
VmallocTotal: 34359738367 kB
VmallocUsed:     31848 kB
VmallocChunk: 34359706279 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     2048 kB

How to repeat:
I have no idea how to repeat it.
[30 Aug 2010 10:12] Gustaf Thorslund
Sean,

> 8171 root      20   0 6953m 6.7g=6256 S    2 85.9 279:09.76 ndbmtd

So your ndbmtd is using almost 7G. Your config.ini would be needed to explain why that's the case.

If you have other applications running on the same host those could have caused the swappping. From the line above it doesn't appear to be ndbmtd that's got swapped out (and that's good).

/Gustaf
[30 Aug 2010 10:40] Sean Lee
Attached the ndb_error_reporter data file bug-data-56358.tar.gz
[30 Aug 2010 18:07] Daniel Smythe
Looking through the configuration of this cluster, it appears that ndbmtd is using an appropriate amount of memory. Using the memory calculation found here:

http://forums.mysql.com/read.php?25,382163,382218#msg-382218

And the defaults for missing values here:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-params-ndbd.html

I come up with the following:

DataMemory = 4800 M
IndexMemory = 600 M
BackupDataBufferSize = 16 M
BackupLogBufferSize = 4 M
DiskPagebufferMemory = 384 M
SharedGlobalMemory = 384 M

(MaxNoOfConcurrentIndexOperations + MaxNoOfConcurrentOperations + MaxNoOfConcurrentTransactions + MaxNoOfOrderedIndexes + MaxNoOfTables + MaxNoOfUniqueHashIndexes) * 1k 
( 8000 + 100000 + 4096 + 2048 + 4096 + 512 ) * 1k === 118 M

RedoBuffer = 48 M
TotalSendBufferMemory = 256 K
UndoDataBuffer = 16 M
UndoIndexBuffer = 2 M

Total == 6.372 G

This appears to be very close to the current memory usage of ndbmtd.

I would recommend monitoring the memory usage to be sure it's not growing, but so far this does not appear to be a bug. Also, you may want to look into LockPagesInMainMemory:

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-lo...

But it appears you are already using it.
[31 Aug 2010 8:34] Alex ldp
we got the  same problem and not resolved till now

the key point is the total memory that all apps used has not reached the size of physical memory

why swap out these pages ?

they are both kept in physical memory and swap file
[31 Aug 2010 8:51] Sean Lee
Hi,

Thanks for you reply. The memory usage and LockPagesInMainMemory=1 sounds right in the environment. That's OK. 

But my point is why have so many swap space is used. As you know the "SwapCached" value in /proc/meminfo means:
      Memory that once was swapped out, is swapped back in but
      still also is in the swapfile (if memory is needed it
      doesn't need to be swapped out AGAIN because it is already
      in the swapfile. This saves I/O)

The archive is here: http://lwn.net/Articles/28345/

So SwapCached:    7421452 kB
means there is almost 7G memory pages in both memory and swap space. And the ndbmtd is the only process on current server which uses so many memory.

The "ps aux | sort -k6,6nr | head -n 5" output is:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     28847  3.6 85.9 7055164 7039940 ?     SLl  Aug24 366:40 /usr/local/mysql/bin//ndbmtd --ndb-nodeid=2 --ndb-connectstring=192.168.12.110:1186
mysql     4877  2.6  4.8 643056 398692 ?       SLl  Aug06 955:41 /usr/local//mysql/bin//mysqld --defaults-file=/usr/local/mysql/etc/my.cnf
root      5003  0.0  0.1  66812 13628 ?        SLs  Aug06  11:36 heartbeat: master control process
nobody    5053  0.0  0.0  60208  7024 ?        SL   Aug06   1:16 heartbeat: write: bcast eth1
nobody    5054  0.0  0.0  60208  7024 ?        SL   Aug06   1:16 heartbeat: read: bcast eth1 

only the ndbmtd process uses the most memory, the second process use just approx 390M. 

So can we think that just the ndbmtd process caused the swapping? And when I shutdown a ndb node in the cluster, the host's swap used size will be almost 0, then start the ndb node the size will increasing slowing. Is that a proof ndbmtd cause this?

So what's confused me is that why have the swapping and used approx 7G? It's abnormal even LockPagesInMainMemory is set to 1, isn't it?
[31 Aug 2010 19:56] Daniel Smythe
Even though the host may be swapping, LockPagesInMainMemory is doing as it should and locking ndbmtd in memory. Having a large amount of memory locked will cause the OS to have to potentially swap other things that wouldn't normally have been swapped. I'd recommend looking into OS level options for swap/memory management and tuning, or how the OS manages its memory.

Marking this as not a bug.
[1 Sep 2010 7:50] Sean Lee
Hi Daniel,

Thanks for your explaining, but it confused me more. If as you said: LockPagesInMainMemory is doing as it should and
locking ndbmtd in memory. Having a large amount of memory locked will cause the OS to
have to potentially swap other things that wouldn't normally have been swapped.

As you know, the linux OS can't tell us which process use the swap and how many is using, the status of nswap and cnswap in /proc/pid/stat is not maintained, the values always are zero.

So, I only can deduce it from the other information, which as I was mentioned the SwapCached value, the value is too big and approx to ndbmtd memory usage. I can not accept the guess that other processes cause the swapping, because of other processes have not enough virtual memory address space to host memory pages both in memory and swap space.

I do study that some OS level options for swap/memory management and tuning, or how the OS manages its memory. But I can't find any clues to explaining the issue. So, I insist on this maybe a bug of mysql cluster, or you can explain it clearly for me.

Thanks for your patience again.
[13 Sep 2010 19:37] MySQL Verification Team
This is still "not a bug".

Unless you are able to show that VSZ for the ndb(mt)d process is growing over time I don't see any evidence that there is a memory leak involved.

The SwapCached value shows the amount of memory has been moved out of physical memory and back in but linux leaves these pages cached in swap memory.  This space is left in swap the event that those pages need to be swapped back out to disk Linux can simply free the physical memory pages without having to perform the costly disk IO of writing them back to swap.  Having a very large SwapCached value means that most of swap is in fact now available but one or more processes had been swapped out at some point in the past.  Your /proc/meminfo indicate only about 28M of "active" swap usage.

There are two possible explanations for your previous swap usage:

It is possible that some other processes memory got swapped out to disk either during ndbd startup or some latter point and ndbd never touched swap.

-or-

Since you are using the option LockPagesInMainMemory = 1; ndbd will allocate all the memory it is going to up front before it is locked to physical memory. It is possible that some ndbd pages got swapped to disk during startup then moved back to memory when the pages were locked.  These could remain in SwapCached but are untouched after startup completes.

Using LockPagesInMainMemory = 2 would require ndbd to lock itself to physical memory before performing any allocations and prevent it from having any swap usage at any point in the process lifecycle.
[16 Sep 2010 9:22] Sean Lee
Hi Matthew,

Thanks from your reply very much. But I still think this is a "performance" bug, sorry. 

If I can prove there is memory leak in the ndb(mt)d, that will be a critical bug, do you think so?

Thanks for your explaining on the "SwapCached" term. I realize that you are right  if there are many process just swap off a few pages and then exit, this also can make a very large SwapCached value, because OS reclaim swap in a lazy way.

But in my environment, I check your guesses by trying to do some test.

At first I try to use LockPagesInMainMemory = 2 this swap size is zero at the start, then it is increasing slowly to appx 5G(not 7G) and no more growing.  

So I assume there is another processes do some swap make this, I stop the mysqld   in one of the two host and swapoff then swapon again to swap used is zero again. Then the swap size is growing again, I then have tried to decrease the DataMemory value from 4800M to 4000M, make the memory have more freed, the swap size is also increasing to appx 2G and no more growing.

The host is a dedicated server only for mysql cluster, except the mysqld and ndbmtd, the other process are some trivial daemons such as webmin , heartbeat and sshd which are common and necessary for a linux server. After I have stop the mysqld the ndb(mt)d is the only no-trivial processer, so I don't think the other process make the so large swap even I have 800M addition memory freed for the test.

Can anybody explaining the weird issue?
[27 Sep 2010 9:00] Gustaf Thorslund
Hi Sean,

If you think this is a bug, could you please provide a test case showing how to reproduce it and showing what impact it has on performance?

From what I can tell so far this looks like a bug in your way of looking at SwapCached. Matthew have already done a serious attempt on working around this little bug. If you have further concerns on this I would suggest you seek advice in forums, mailing lists, irc, or open a support issue.

/Gustaf
[29 Sep 2010 14:20] Sean Lee
Hi Gustaf,

Thanks for your reply, maybe I misunderstand the swapcached means. OK, that's all right.

As you know I cann't repeat the issue on my test environment servers, the differences between test and production servers are RAID, CPU and OS version. The
production servers have a 4-core cpu and RAIDS 1+0 and Ubuntu 8.04, and the test servers have a 2-core cpu and no RAID and on Ubuntu 10.04. I still have no idea what make the server swap so much.

As you said, I had tested the swap impact on performance. First I run the sysbench like this:

sysbench --mysql-user=root --test=oltp --mysql-host=host --mysql-password=pass --oltp-test-mode=complex --mysql-table-engine=ndbcluster --oltp-table-size=20000000 --mysql-db=ndb --num-threads=100 --max-requests=0 --max-time=60 --mysql-create-options="TABLESPACE ts_1 STORAGE DISK DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci" run

On the normal production severs 10 times, with the swappiness value = 60(default). In the test the swap show used appx 5.0G, after I run "swapoff -a ; swapon -a" already.

Then I set the swappiness value = 0, and swappoff and swapon again, sysbench 10 time too. In the tests the swap used only 152K.

Finally, I reduce the DataMemory and IndexMemory approx a half the current value, and run the bench 10 time again. In the tests no swap used.

I extract the read/write requests per sec value as the sample, and get the average value of the 10 times test. The result is:

swappiness 60(default) with the current size: 706.794 read/write requests per sec.

swappiness 0(default) with the current size: 793.553 read/write requests per sec.

swappiness 60(default) with the half size:  788.341 read/write requests per sec.

It shows the swap maybe impact approx 10-15% performance reducing?

Gustaf, thanks you and Matthew's works on the litter bug. I don't think I am a biased man. I just think the issue confused me so much, maybe somebody can give me any explaining or help. Of course, you have the right decide the issue whether or not a bug. I just do what I can do to seeking for help and give some information for make mysql better.

A more question: if set the swappiness = 0 can reduce the swap used(from 5G to 152k), is it means the ndb(mt)d attempt to swap so agressively? or LockPagesInMainMemory=2 not works for me on default swappiness(60)?

Thanks for your patience again.
[29 Sep 2010 14:21] Sean Lee
sysbench output on swappiness = 0

Attachment: no_swap.out (application/octet-stream, text), 15.38 KiB.

[29 Sep 2010 14:23] Sean Lee
sysbench output on swappiness is default 60

Attachment: has_swap.out (application/octet-stream, text), 15.38 KiB.

[29 Sep 2010 14:24] Sean Lee
sysbench output when swappiness is 60 and half ndb size

Attachment: half.out (application/octet-stream, text), 15.38 KiB.