Bug #34438 ndb_mgm process takes 100% cpu
Submitted: 8 Feb 2008 23:39 Modified: 20 May 2009 13:27
Reporter: Jeff Wang Email Updates:
Status: Verified Impact on me:
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:mysql-5.1-telco-6.3 OS:Mac OS X
Assigned to: CPU Architecture:Any
Tags: 5.1.24
Triage: Triaged: D2 (Serious) / R6 (Needs Assessment) / E6 (Needs Assessment)

[8 Feb 2008 23:39] Jeff Wang
I compiled mysql from the BK source for 64 bit on x86_64.  Everything seems to be working except when I log into the mgm shell (ndb_mgm) it will take up 100% cpu after issuing a few commands.

How to repeat:
Compiled as follows:

 CFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk  -mmacosx-version-min=10.4  -arch x86_64 -m64    -Wall -Wconversion -O3  -fno-omit-frame-pointer' 
 LDFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk   -mmacosx-version-min=10.4  -arch x86_64 -m64     -Wall -Wconversion -O3  -fno-omit-frame-pointer'
 CXXFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -arch x86_64 -m64    -Wall -Wconversion -O3  -fno-omit-frame-pointer '

 export CFLAGS
 export LDFLAGS
 export CXXFLAGS
 export CC
 export CXX

 ./configure --disable-dependency-tracking --prefix=/usr/local/mysql --enable-local-infile --disable-shared --enable-thread-safe-client --with-ndbcluster --with-big-tables  --with-extra-charsets=complex 

Then started ndb_mgmd and 2 ndb nodes all on the same server.  Log into ndb_mgm, issue a few commands, and it will start to take 100% cpu.
[1 Apr 2008 20:18] Sveta Smirnova
Thank you for the report.

Please provide your Cluster and mysqld configuration files.
[2 Apr 2008 18:05] Sveta Smirnova
Please also provide command you use when start ndb_mgm.
[9 Apr 2008 23:13] Jeff Wang
The command I used in the mgm shell is "show" or "1 status" or "2 status".

---Master config file---

# Options affecting ndbd processes on all data nodes:
[ndbd default]    
NoOfReplicas=2      # Number of replicas
DataMemory=6000M    # How much memory to allocate for data storage
IndexMemory=1000M    # How much memory to allocate for index storage
StringMemory=10000000     #expressed as a percentage, 100%=5 MB, values > 99 interpreted as bytes

#increase this number if the number of inserts/updates/delete is large.
#Each Log file is 4 * 16 MB = 64 MB. So 1024 Log files can accomodate 512*64 = 32 GB
#There should be enough log files to accommdate at least the time to do 6 LCPs.
#Thus, if each LCP is 300 seconds, we need to support 6*300 seconds=1800 seconds of REDO Logs.
#So, if you are writing 10 MB/second of insert/update/del info, you need to have 18 GB of Redo logs.

#the following params control LCP speed.  A Local Checkpoint is how often the contents of 
#Data Memory are flushed to diskThe default values should go good enough for
#a machine w/ 2 GB RAM.  LCPs should occur at ~ 5 minute intervals leading to a 2-3 minute node restart time.
#These parameters need to be adjusted for machines with larger RAM.  
#A rough estimatation for CheckpointSpeed is DataMemory/(seconds to checkpoint) (ie: 2000 MB/300 seconds = 6.8 MB/sec)


#Each attribute used 200 bytes of storarge/node
#Should be at least  3 times to size of all attributes you expect because Alter table statement use them
#Also take into account attributes in hidden tables (ie: unique index table, ordered index,  blob tables, index trigger)

#should be at least 2 times number of expected tables.  A hidden table is created for each ordered index
#Each table object consumes 20KB/node

#should be at least 2 times number of expected tables.  2 indexes created for each unique index (hash + ordered)
#Each index uses 10KB/node

#amount of time to elapse before aborting the transaction 
#and assuming deadlock on other node.
TransactionDeadlockDetectionTimeout=15000 #in ms

#amount of time between operations in the same transaction
#0 indicates no timeout
#units in ms

#number of simultaneous updates (or selects using locks) that occur at once.
#Lookups on unique indexes require 2 records (due to a look up in a hidden index table), blobs do as well (?).
#number should be maxNumSimulataneousUpdates/ # nodes.
#Each record requires 1 KB so what your memory usage.

#Logging, values can be 0 to 15 where 15 is the most verbose

# TCP/IP options:
[tcp default]     
Checksum=1                        #detect corrupted messages

# Management process options:
hostname=x.x.com           # Hostname or IP address of MGM node
datadir=/Users/x/work/cluster/mgm_data  # Directory for MGM node log files

# Options for data node "A":

datadir=/Users/x/work/cluster/data   # Directory for this data node's data files

# Options for data node "B":
#hostname=x.x.com          # Hostname or IP address of MGM node
datadir=/Users/x/work/cluster/data   # Directory for this data node's data files

# SQL node options:

[10 Apr 2008 22:19] Sveta Smirnova
Thank you for the feedback.

As 'command you use when start ndb_mgm' I meant which options do you provide to ndb_mgm. Like `bin/ndb_mgm --ndb-mgmd-host=`.
[11 Apr 2008 17:48] Jeff Wang

I don't provide any option on the command line.  So, I just use 'ndb_mgm' to start the shell.  This reads the my.cnf file which only has this:


Also, when starting the daemon, I do not provide any options. So I just used 'ndb_mgmd' and the only option in the my.cnf file is the absolute path to the config file (which I provided above).

[20 May 2009 13:27] Geert Vanderkelen
Verified, this is indeed a problem on Mac. ndb_mgm is unusable. MySQL Cluster 7.0.5 has same problem.
Running ndb_mgm from the binaries available on dev.mysql.com, it just hangs causing huge load.
After a while there's a warning coming.

ndb_mgm> show
Warning, event thread startup failed, degraded printouts as result, errno=36

S2/D2, because basically MySQL Cluster doesn't really work on Mac right now.
[16 Nov 2009 12:46] Geert Vanderkelen
Still problem in 7.0.9b (Mac OS X 10.6.2, _no_ MacPorts)
 shell> ndb_mgmd -f /path/to/config.ini
 shell> ndbd

When data node is started, doing:
 shell> ndb_mgm
 ndb_mgm> show

ndb_mgmd skyrocketing on CPU (I saw 190%).

Workaround: run ndb_mgmd with --nodaemon option.

See also #47214 (maybe they are related, or maybe 2 problems with similar effect).

W3 : Workaround available, not perfect, but good for testing.