Bug #27092 NDBD not freeing up memory after stopping
Submitted: 13 Mar 2007 15:39 Modified: 18 Jul 2007 16:37
Reporter: Jeremy Kusnetz Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.16 OS:Linux (Linux x86_64)
Assigned to: Assigned Account CPU Architecture:Any

[13 Mar 2007 15:39] Jeremy Kusnetz
Description:
There is something broken here, NDBD doesn't seem to be freeing up memory after restarting.

Here is an error when trying to restart:

2007-03-13 15:13:22 [ndbd] INFO -- Ndb has terminated (pid 29701) restarting
2007-03-13 15:13:22 [ndbd] INFO -- Angel pid: 29696 ndb pid: 29705
2007-03-13 15:13:22 [ndbd] INFO -- NDB Cluster -- DB node 3
2007-03-13 15:13:22 [ndbd] INFO -- Version 5.1.16 (beta) --
2007-03-13 15:13:22 [ndbd] INFO -- Configuration fetched at ndb_mgmd1 port 1186
2007-03-13 15:13:23 [ndbd] INFO -- Start initiated (version 5.1.16)
2007-03-13 15:13:23 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 20Mb initial: 20Mb
2007-03-13 15:13:23 [ndbd] INFO -- Error handler startup restarting system
2007-03-13 15:13:24 [ndbd] INFO -- Error handler shutdown completed - exiting
2007-03-13 15:13:24 [ndbd] INFO -- Angel received ndbd startup failure count 3.
2007-03-13 15:13:24 [ndbd] ALERT -- Ndbd has failed 3 consecutive startups. Not restarting
2007-03-13 15:13:24 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 0. Caused by error 2327: 'Memory allocation failure, please decrease some configuration parameters(Configuration error). Permanent error, external action needed'.

I have 10 gigs for data and 2 gigs for index memory. The machine has 16 gigs of memory.

But, free shows I'm using 10 gigs of memory with no NDBD running.

[root@sqlc2 ~]# free -g
total used free shared buffers cached
Mem: 15 10 5 0 0 1
-/+ buffers/cache: 7 7
Swap: 1 0 1

Here is PS sorted by memory used, nothing is using up that memory:

[root@sqlc2 ~]# ps axww --sort -size -o pid -o size -o etime -o stat -o command
PID SZ ELAPSED STAT COMMAND
20305 542604 2-18:04:49 Sl /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/sqlc2.isp.pid --log-error=/var/lib/mysql/sqlc2.isp.err --socket=/var/lib/mysql/mysql.sock --port=3306
4115 279788 10-20:11:23 Sl /opt/hp/hpsmh/sbin/hpsmhd -DSSL -f /opt/hp/hpsmh/conf/smhpd.conf
6931 82280 10-20:10:52 Ssl hpasmxld -f /dev/ipmi0
4726 61664 10-20:11:17 Sl cmahealthd -p 30 -s OK -t OK -i
5165 48340 10-20:11:11 Sl cmanicd
4861 41168 10-20:11:17 Ssl /sbin/cpqriisd -F
4938 31008 10-20:11:11 Sl cmarackd -p 120
520 16616 2-11:18:20 Sl /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd -a
6198 11168 10-20:11:08 Sl /opt/hp/vcagent/bin/vcagentd
4670 10892 10-20:11:17 Sl cmapeerd
4705 10596 10-20:11:17 S cmahostd -p 15 -s OK
4721 10520 10-20:11:17 S cmastdeqd -p 30
5118 10480 10-20:11:11 S cmafcad -p 15 -s OK
5120 10436 10-20:11:11 S cmaided -p 15 -s OK
4996 10432 10-20:11:11 Sl cmaeventd -p 15
6044 6304 10-20:11:08 Ss hald
4092 3200 10-20:11:23 Ss /opt/hp/hpsmh/sbin/hpsmhd -DSSL -f /opt/hp/hpsmh/conf/smhpd.conf
491 1736 2-11:18:20 Ss cupsd
3848 1592 10-20:11:27 Ss sendmail: accepting connections
3856 1448 10-20:11:27 Ss sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
4103 940 10-20:11:23 Ss crond
4154 892 10-20:11:21 Ss xfs -droppriv -daemon
1430 880 00:00 R+ ps axww --sort -size -o pid -o size -o etime -o stat -o command
28462 724 09:07 Ss sshd: root@pts/0
4658 620 10-20:11:18 S cmathreshd -p 5 -s OK
4741 592 10-20:11:17 S cmaperfd -p 30 -s OK
795 564 00:52 Ss sshd: root@pts/1
5016 472 10-20:11:11 S cmaidad -p 15 -s OK
3815 456 10-20:11:28 Ss /usr/sbin/sshd
3606 400 10-20:11:30 Ss rpc.idmapd
3829 376 10-20:11:28 Ss xinetd -stayalive -pidfile /var/run/xinetd.pid
28464 348 09:07 Ss+ -bash
4093 344 10-20:11:23 S /opt/hp/hpsmh/bin/rotatelogs /var/spool/opt/hp/hpsmh/logs/error_log 5M
4094 344 10-20:11:23 S /opt/hp/hpsmh/bin/rotatelogs /var/spool/opt/hp/hpsmh/logs/access_log 5M
797 340 00:52 Ss -bash
4918 220 10-20:11:11 S cmasm2d -p 30
1 212 10-20:12:43 S init [3]
4923 212 10-20:11:11 S cmasm2d -p 30
6035 200 10-20:11:09 Ss dbus-daemon-1 --system
20240 200 2-18:04:49 S /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/sqlc2.isp.pid
3534 196 10-20:11:32 Ss irqbalance
3552 196 10-20:11:32 Ss portmap
3571 196 10-20:11:31 Ss rpc.statd
3520 184 10-20:11:32 Ss syslogd -m 0
3866 184 10-20:11:27 Ss gpm -m /dev/input/mice -t imps2
3704 180 10-20:11:29 Ss /usr/sbin/acpid
6026 180 10-20:11:09 Ss /usr/sbin/atd
3524 176 10-20:11:32 Ss klogd -x
1560 172 10-20:12:32 S<s udevd
6342 172 10-20:11:06 Ss+ /sbin/mingetty tty1
6343 172 10-20:11:06 Ss+ /sbin/mingetty tty2
6344 172 10-20:11:06 Ss+ /sbin/mingetty tty3
6345 172 10-20:11:06 Ss+ /sbin/mingetty tty4
6346 172 10-20:11:06 Ss+ /sbin/mingetty tty5
6347 172 10-20:11:06 Ss+ /sbin/mingetty tty6
2 0 10-20:12:43 S [migration/0]
3 0 10-20:12:43 SN [ksoftirqd/0]
4 0 10-20:12:43 S [migration/1]
5 0 10-20:12:43 SN [ksoftirqd/1]
6 0 10-20:12:43 S [migration/2]
7 0 10-20:12:43 SN [ksoftirqd/2]
8 0 10-20:12:43 S [migration/3]
9 0 10-20:12:43 SN [ksoftirqd/3]
10 0 10-20:12:43 S [migration/4]
11 0 10-20:12:43 SN [ksoftirqd/4]
12 0 10-20:12:43 S [migration/5]
13 0 10-20:12:43 SN [ksoftirqd/5]
14 0 10-20:12:43 S [migration/6]
15 0 10-20:12:43 SN [ksoftirqd/6]
16 0 10-20:12:43 S [migration/7]
17 0 10-20:12:43 SN [ksoftirqd/7]
18 0 10-20:12:43 S< [events/0]
19 0 10-20:12:43 S< [events/1]
20 0 10-20:12:43 S< [events/2]
21 0 10-20:12:43 S< [events/3]
22 0 10-20:12:43 S< [events/4]
23 0 10-20:12:43 S< [events/5]
24 0 10-20:12:43 S< [events/6]
25 0 10-20:12:43 S< [events/7]
26 0 10-20:12:43 S< [khelper]
27 0 10-20:12:43 S< [kacpid]
83 0 10-20:12:43 S< [kblockd/0]
84 0 10-20:12:43 S< [kblockd/1]
85 0 10-20:12:43 S< [kblockd/2]
86 0 10-20:12:43 S< [kblockd/3]
87 0 10-20:12:43 S< [kblockd/4]
88 0 10-20:12:43 S< [kblockd/5]
89 0 10-20:12:43 S< [kblockd/6]
90 0 10-20:12:43 S< [kblockd/7]
91 0 10-20:12:43 S [khubd]
126 0 10-20:12:43 S [pdflush]
127 0 10-20:12:43 S [pdflush]
129 0 10-20:12:43 S< [aio/0]
128 0 10-20:12:43 S [kswapd0]
130 0 10-20:12:43 S< [aio/1]
131 0 10-20:12:43 S< [aio/2]
132 0 10-20:12:43 S< [aio/3]
133 0 10-20:12:43 S< [aio/4]
134 0 10-20:12:43 S< [aio/5]
135 0 10-20:12:43 S< [aio/6]
136 0 10-20:12:43 S< [aio/7]
280 0 10-20:12:42 S [kseriod]
431 0 10-20:12:39 S [kjournald]
2313 0 10-20:12:30 S< [kauditd]
2530 0 10-20:12:24 S< [kmirrord]
2548 0 10-20:12:24 S [kjournald]
[root@sqlc2 ~]#

BTW, a reboot of the server fixed the problem.

How to repeat:
stop and start ndbd a few times and eventually you won't be able to start again as there isn't enough memory to allocate to the data and index memory pools.
[18 Jun 2007 16:37] Sveta Smirnova
Thank you for the report.

I can not repeat described behaviour in myself environment. Please upgrade to current 5.1.19 version, try with it and if you can repeat error, please indicate accurate version of ooperation system you use.
[18 Jul 2007 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".