MySQL Bugs: #35901: ndbd gives malloc error when starting up with SharedGlobalMemory

Bug #35901	ndbd gives malloc error when starting up with SharedGlobalMemory > 1012M
Submitted:	8 Apr 2008 12:43	Modified:	22 Oct 2014 16:08
Reporter:	iain smith	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	mysql-5.1	OS:	Linux (redhat enterprise 4 x86_64)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	community-5.1.23-0.rhel4

Description:
When starting up ndbd cluster storage nodes with the SharedGlobalMemory parameter (in config.ini on the ndb_mgmd server) set larger than 1012M, the following is logged in the ndb_x_out.log:
----
2008-04-08 12:32:04 [ndbd] INFO     -- Angel pid: 7337 ndb pid: 7338
2008-04-08 12:32:04 [ndbd] INFO     -- NDB Cluster -- DB node 3
2008-04-08 12:32:04 [ndbd] INFO     -- Version 5.1.23 (rc) --
2008-04-08 12:32:04 [ndbd] INFO     -- Configuration fetched at localhost port 1186
2008-04-08 12:32:04 [ndbd] INFO     -- Start initiated (version 5.1.23)
2008-04-08 12:32:04 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 1013Mb initial: 1013Mb
2008-04-08 12:32:04 [ndbd] INFO     -- sbrk(1048576) failed, trying malloc
ndbd_malloc_impl.cpp:330:grow(5583481, 31) 5583481!=5505024 - Unable to use due to bitmap pages missaligned!!
2008-04-08 12:32:04 [ndbd] ERROR    -- ndbd_malloc_impl.cpp:333:grow(5583481, 31) - Unable to use due to bitmap pages missaligned!!
WOPool::init(61, 9)
RWPool::init(82, 13)
RWPool::init(a2, 18)
RWPool::init(c2, 13)
RWPool::init(122, 18)
RWPool::init(142, 18)
WOPool::init(41, 12)
RWPool::init(e2, 12)
RWPool::init(102, 52)
WOPool::init(21, 10)
m_active_buckets.set(0)
----

- starting ndbd with the SharedGlobalMemory parameter set smaller than 1013M, the above errors do not appear.
- apologies if the 'critical' severity allocated here is inappropriate, wasn't sure how to categorize - I'm running a PoC mysql cluster and migrating a very large database from Oracle RAC into it - not yet sure whether this will turn out to be a blocker issue for this project. 

How to repeat:
hardware: IBM System X x3950 server, 64Gb RAM, 4 x dual-core intel Xeon 3.16Ghz CPUs

OS: 64-bit Redhat Enterprise Linux 4 update 5 for x86_64, kernel version 2.6.9-55.0.6.ELsmp

SOFTWARE:installed the following RPMs from mysql.com:

MySQL-clusterextra-community-5.1.23-0.rhel4.x86_64.rpm
MySQL-clusterstorage-community-5.1.23-0.rhel4.x86_64.rpm
MySQL-clustermanagement-community-5.1.23-0.rhel4.x86_64.rpm
MySQL-clustertools-community-5.1.23-0.rhel4.x86_64.rpm

CONFIG: create minimal cluster management config.ini file with all settings default except for large SharedGlobalMemory (any value over 1012M, eg. 1G), as follows:

----
[ndbd default]
NoOfReplicas= 1
DataDir= /var/lib/mysql-cluster
SharedGlobalMemory= 1013M

[ndb_mgmd default]
DataDir= /var/lib/mysql-cluster

[ndb_mgmd]
Id=1
HostName= <hostname of server>

[ndbd]
Id= 3
HostName= <hostname of server>

[mysqld]
Id= 4

[tcp default]

----

- start up ndb_mgmd
- start up ndbd --initial

- note malloc error in ndb_x_out.log
- run ndb_mgm -e shutdown, edit config.ini to reduce SharedGlobalMemory to any value below 1012M and repeat - error now does not occur.

Thank you for the report.

I can not repeat described behavior. Please provide output of ulimit -a

Hi - here is the output of 'ulimit -a' on ndbd node:
----
# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 1024
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16384
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
----

Maybe you want to set DiskPageBufferMemory instead?
Unlike Oracle, our "SGA" does not include it.