Description:
There is something broken here, NDBD doesn't seem to be freeing up memory after restarting.
Here is an error when trying to restart:
2007-03-13 15:13:22 [ndbd] INFO -- Ndb has terminated (pid 29701) restarting
2007-03-13 15:13:22 [ndbd] INFO -- Angel pid: 29696 ndb pid: 29705
2007-03-13 15:13:22 [ndbd] INFO -- NDB Cluster -- DB node 3
2007-03-13 15:13:22 [ndbd] INFO -- Version 5.1.16 (beta) --
2007-03-13 15:13:22 [ndbd] INFO -- Configuration fetched at ndb_mgmd1 port 1186
2007-03-13 15:13:23 [ndbd] INFO -- Start initiated (version 5.1.16)
2007-03-13 15:13:23 [ndbd] INFO -- Ndbd_mem_manager::init(1) min: 20Mb initial: 20Mb
2007-03-13 15:13:23 [ndbd] INFO -- Error handler startup restarting system
2007-03-13 15:13:24 [ndbd] INFO -- Error handler shutdown completed - exiting
2007-03-13 15:13:24 [ndbd] INFO -- Angel received ndbd startup failure count 3.
2007-03-13 15:13:24 [ndbd] ALERT -- Ndbd has failed 3 consecutive startups. Not restarting
2007-03-13 15:13:24 [ndbd] ALERT -- Node 3: Forced node shutdown completed. Occured during startphase 0. Caused by error 2327: 'Memory allocation failure, please decrease some configuration parameters(Configuration error). Permanent error, external action needed'.
I have 10 gigs for data and 2 gigs for index memory. The machine has 16 gigs of memory.
But, free shows I'm using 10 gigs of memory with no NDBD running.
[root@sqlc2 ~]# free -g
total used free shared buffers cached
Mem: 15 10 5 0 0 1
-/+ buffers/cache: 7 7
Swap: 1 0 1
Here is PS sorted by memory used, nothing is using up that memory:
[root@sqlc2 ~]# ps axww --sort -size -o pid -o size -o etime -o stat -o command
PID SZ ELAPSED STAT COMMAND
20305 542604 2-18:04:49 Sl /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --pid-file=/var/lib/mysql/sqlc2.isp.pid --log-error=/var/lib/mysql/sqlc2.isp.err --socket=/var/lib/mysql/mysql.sock --port=3306
4115 279788 10-20:11:23 Sl /opt/hp/hpsmh/sbin/hpsmhd -DSSL -f /opt/hp/hpsmh/conf/smhpd.conf
6931 82280 10-20:10:52 Ssl hpasmxld -f /dev/ipmi0
4726 61664 10-20:11:17 Sl cmahealthd -p 30 -s OK -t OK -i
5165 48340 10-20:11:11 Sl cmanicd
4861 41168 10-20:11:17 Ssl /sbin/cpqriisd -F
4938 31008 10-20:11:11 Sl cmarackd -p 120
520 16616 2-11:18:20 Sl /usr/sbin/snmpd -Lsd -Lf /dev/null -p /var/run/snmpd -a
6198 11168 10-20:11:08 Sl /opt/hp/vcagent/bin/vcagentd
4670 10892 10-20:11:17 Sl cmapeerd
4705 10596 10-20:11:17 S cmahostd -p 15 -s OK
4721 10520 10-20:11:17 S cmastdeqd -p 30
5118 10480 10-20:11:11 S cmafcad -p 15 -s OK
5120 10436 10-20:11:11 S cmaided -p 15 -s OK
4996 10432 10-20:11:11 Sl cmaeventd -p 15
6044 6304 10-20:11:08 Ss hald
4092 3200 10-20:11:23 Ss /opt/hp/hpsmh/sbin/hpsmhd -DSSL -f /opt/hp/hpsmh/conf/smhpd.conf
491 1736 2-11:18:20 Ss cupsd
3848 1592 10-20:11:27 Ss sendmail: accepting connections
3856 1448 10-20:11:27 Ss sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
4103 940 10-20:11:23 Ss crond
4154 892 10-20:11:21 Ss xfs -droppriv -daemon
1430 880 00:00 R+ ps axww --sort -size -o pid -o size -o etime -o stat -o command
28462 724 09:07 Ss sshd: root@pts/0
4658 620 10-20:11:18 S cmathreshd -p 5 -s OK
4741 592 10-20:11:17 S cmaperfd -p 30 -s OK
795 564 00:52 Ss sshd: root@pts/1
5016 472 10-20:11:11 S cmaidad -p 15 -s OK
3815 456 10-20:11:28 Ss /usr/sbin/sshd
3606 400 10-20:11:30 Ss rpc.idmapd
3829 376 10-20:11:28 Ss xinetd -stayalive -pidfile /var/run/xinetd.pid
28464 348 09:07 Ss+ -bash
4093 344 10-20:11:23 S /opt/hp/hpsmh/bin/rotatelogs /var/spool/opt/hp/hpsmh/logs/error_log 5M
4094 344 10-20:11:23 S /opt/hp/hpsmh/bin/rotatelogs /var/spool/opt/hp/hpsmh/logs/access_log 5M
797 340 00:52 Ss -bash
4918 220 10-20:11:11 S cmasm2d -p 30
1 212 10-20:12:43 S init [3]
4923 212 10-20:11:11 S cmasm2d -p 30
6035 200 10-20:11:09 Ss dbus-daemon-1 --system
20240 200 2-18:04:49 S /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/sqlc2.isp.pid
3534 196 10-20:11:32 Ss irqbalance
3552 196 10-20:11:32 Ss portmap
3571 196 10-20:11:31 Ss rpc.statd
3520 184 10-20:11:32 Ss syslogd -m 0
3866 184 10-20:11:27 Ss gpm -m /dev/input/mice -t imps2
3704 180 10-20:11:29 Ss /usr/sbin/acpid
6026 180 10-20:11:09 Ss /usr/sbin/atd
3524 176 10-20:11:32 Ss klogd -x
1560 172 10-20:12:32 S<s udevd
6342 172 10-20:11:06 Ss+ /sbin/mingetty tty1
6343 172 10-20:11:06 Ss+ /sbin/mingetty tty2
6344 172 10-20:11:06 Ss+ /sbin/mingetty tty3
6345 172 10-20:11:06 Ss+ /sbin/mingetty tty4
6346 172 10-20:11:06 Ss+ /sbin/mingetty tty5
6347 172 10-20:11:06 Ss+ /sbin/mingetty tty6
2 0 10-20:12:43 S [migration/0]
3 0 10-20:12:43 SN [ksoftirqd/0]
4 0 10-20:12:43 S [migration/1]
5 0 10-20:12:43 SN [ksoftirqd/1]
6 0 10-20:12:43 S [migration/2]
7 0 10-20:12:43 SN [ksoftirqd/2]
8 0 10-20:12:43 S [migration/3]
9 0 10-20:12:43 SN [ksoftirqd/3]
10 0 10-20:12:43 S [migration/4]
11 0 10-20:12:43 SN [ksoftirqd/4]
12 0 10-20:12:43 S [migration/5]
13 0 10-20:12:43 SN [ksoftirqd/5]
14 0 10-20:12:43 S [migration/6]
15 0 10-20:12:43 SN [ksoftirqd/6]
16 0 10-20:12:43 S [migration/7]
17 0 10-20:12:43 SN [ksoftirqd/7]
18 0 10-20:12:43 S< [events/0]
19 0 10-20:12:43 S< [events/1]
20 0 10-20:12:43 S< [events/2]
21 0 10-20:12:43 S< [events/3]
22 0 10-20:12:43 S< [events/4]
23 0 10-20:12:43 S< [events/5]
24 0 10-20:12:43 S< [events/6]
25 0 10-20:12:43 S< [events/7]
26 0 10-20:12:43 S< [khelper]
27 0 10-20:12:43 S< [kacpid]
83 0 10-20:12:43 S< [kblockd/0]
84 0 10-20:12:43 S< [kblockd/1]
85 0 10-20:12:43 S< [kblockd/2]
86 0 10-20:12:43 S< [kblockd/3]
87 0 10-20:12:43 S< [kblockd/4]
88 0 10-20:12:43 S< [kblockd/5]
89 0 10-20:12:43 S< [kblockd/6]
90 0 10-20:12:43 S< [kblockd/7]
91 0 10-20:12:43 S [khubd]
126 0 10-20:12:43 S [pdflush]
127 0 10-20:12:43 S [pdflush]
129 0 10-20:12:43 S< [aio/0]
128 0 10-20:12:43 S [kswapd0]
130 0 10-20:12:43 S< [aio/1]
131 0 10-20:12:43 S< [aio/2]
132 0 10-20:12:43 S< [aio/3]
133 0 10-20:12:43 S< [aio/4]
134 0 10-20:12:43 S< [aio/5]
135 0 10-20:12:43 S< [aio/6]
136 0 10-20:12:43 S< [aio/7]
280 0 10-20:12:42 S [kseriod]
431 0 10-20:12:39 S [kjournald]
2313 0 10-20:12:30 S< [kauditd]
2530 0 10-20:12:24 S< [kmirrord]
2548 0 10-20:12:24 S [kjournald]
[root@sqlc2 ~]#
BTW, a reboot of the server fixed the problem.
How to repeat:
stop and start ndbd a few times and eventually you won't be able to start again as there isn't enough memory to allocate to the data and index memory pools.