MySQL Bugs: #34438: ndb_mgm process takes 100% cpu

Bug #34438	ndb_mgm process takes 100% cpu
Submitted:	8 Feb 2008 23:39	Modified:	20 May 2009 13:27
Reporter:	Jeff Wang	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	mysql-5.1-telco-6.3	OS:	MacOS
Assigned to:		CPU Architecture:	Any
Tags:	5.1.24

Description:
I compiled mysql from the BK source for 64 bit on x86_64.  Everything seems to be working except when I log into the mgm shell (ndb_mgm) it will take up 100% cpu after issuing a few commands.

How to repeat:
Compiled as follows:

 MACOSX_DEPLOYMENT_TARGET=10.4
 CFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk  -mmacosx-version-min=10.4  -arch x86_64 -m64    -Wall -Wconversion -O3  -fno-omit-frame-pointer' 
 LDFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk   -mmacosx-version-min=10.4  -arch x86_64 -m64     -Wall -Wconversion -O3  -fno-omit-frame-pointer'
 CXXFLAGS='  -isysroot /Developer/SDKs/MacOSX10.4u.sdk -mmacosx-version-min=10.4 -arch x86_64 -m64    -Wall -Wconversion -O3  -fno-omit-frame-pointer '
 CC=gcc
 CXX=gcc

 export MACOSX_DEPLOYMENT_TARGET
 export CFLAGS
 export LDFLAGS
 export CXXFLAGS
 export CC
 export CXX

 ./configure --disable-dependency-tracking --prefix=/usr/local/mysql --enable-local-infile --disable-shared --enable-thread-safe-client --with-ndbcluster --with-big-tables  --with-extra-charsets=complex 

Then started ndb_mgmd and 2 ndb nodes all on the same server.  Log into ndb_mgm, issue a few commands, and it will start to take 100% cpu.

Thank you for the report.

Please provide your Cluster and mysqld configuration files.

Please also provide command you use when start ndb_mgm.

The command I used in the mgm shell is "show" or "1 status" or "2 status".

---Master config file---

		
# Options affecting ndbd processes on all data nodes:
[ndbd default]    
NoOfReplicas=2      # Number of replicas
DataMemory=6000M    # How much memory to allocate for data storage
IndexMemory=1000M    # How much memory to allocate for index storage
StringMemory=10000000     #expressed as a percentage, 100%=5 MB, values > 99 interpreted as bytes

#increase this number if the number of inserts/updates/delete is large.
#Each Log file is 4 * 16 MB = 64 MB. So 1024 Log files can accomodate 512*64 = 32 GB
#There should be enough log files to accommdate at least the time to do 6 LCPs.
#Thus, if each LCP is 300 seconds, we need to support 6*300 seconds=1800 seconds of REDO Logs.
#So, if you are writing 10 MB/second of insert/update/del info, you need to have 18 GB of Redo logs.
NoOfFragmentLogFiles=96
#FragmentLogFileSize=96M

#the following params control LCP speed.  A Local Checkpoint is how often the contents of 
#Data Memory are flushed to diskThe default values should go good enough for
#a machine w/ 2 GB RAM.  LCPs should occur at ~ 5 minute intervals leading to a 2-3 minute node restart time.
#These parameters need to be adjusted for machines with larger RAM.  
#A rough estimatation for CheckpointSpeed is DataMemory/(seconds to checkpoint) (ie: 2000 MB/300 seconds = 6.8 MB/sec)
DiskSyncSize=4M
DiskCheckpointSpeed=10M
DiskCheckpointSpeedInRestart=100M

RedoBuffer=32M 

#Each attribute used 200 bytes of storarge/node
#Should be at least  3 times to size of all attributes you expect because Alter table statement use them
#Also take into account attributes in hidden tables (ie: unique index table, ordered index,  blob tables, index trigger)
MaxNoOfAttributes=16000

#should be at least 2 times number of expected tables.  A hidden table is created for each ordered index
#Each table object consumes 20KB/node
MaxNoOfTables=8000

#should be at least 2 times number of expected tables.  2 indexes created for each unique index (hash + ordered)
#Each index uses 10KB/node
MaxNoOfOrderedIndexes=8000

#amount of time to elapse before aborting the transaction 
#and assuming deadlock on other node.
TransactionDeadlockDetectionTimeout=15000 #in ms

#amount of time between operations in the same transaction
#0 indicates no timeout
#units in ms
#TransactionInactiveTimeout=0 

#number of simultaneous updates (or selects using locks) that occur at once.
#Lookups on unique indexes require 2 records (due to a look up in a hidden index table), blobs do as well (?).
#number should be maxNumSimulataneousUpdates/ # nodes.
#Each record requires 1 KB so what your memory usage.
MaxNoOfConcurrentOperations=50000

#Logging, values can be 0 to 15 where 15 is the most verbose
LogLevelCheckpoint=10
LogLevelCongestion=10
LogLevelConnection=10
LogLevelError=10
LogLevelInfo=10
LogLevelNodeRestart=10
LogLevelShutdown=10
LogLevelStartup=10
LogLevelStatistic=10
#MemReportFrequency=0

# TCP/IP options:
[tcp default]     
SendBufferMemory=2M 
ReceiveBufferMemory=1M 
Checksum=1                        #detect corrupted messages

# Management process options:
[ndb_mgmd]                      
id=1
hostname=x.x.com           # Hostname or IP address of MGM node
datadir=/Users/x/work/cluster/mgm_data  # Directory for MGM node log files

# Options for data node "A":
[ndbd]                          
id=2

hostname=x.x.com
datadir=/Users/x/work/cluster/data   # Directory for this data node's data files

# Options for data node "B":
[ndbd]                          
id=3
#hostname=x.x.com          # Hostname or IP address of MGM node
hostname=x.x.com
datadir=/Users/x/work/cluster/data   # Directory for this data node's data files

# SQL node options:
[mysqld]                        
id=4
[mysqld]                        
id=5

[mysqld]
[mysqld]

Thank you for the feedback.

As 'command you use when start ndb_mgm' I meant which options do you provide to ndb_mgm. Like `bin/ndb_mgm --ndb-mgmd-host=127.0.0.1:35118`.

Hello,

I don't provide any option on the command line.  So, I just use 'ndb_mgm' to start the shell.  This reads the my.cnf file which only has this:

[ndb_mgm]
ndb-connectstring=127.0.0.1:1186

Also, when starting the daemon, I do not provide any options. So I just used 'ndb_mgmd' and the only option in the my.cnf file is the absolute path to the config file (which I provided above).

thanks

Verified, this is indeed a problem on Mac. ndb_mgm is unusable. MySQL Cluster 7.0.5 has same problem.
Running ndb_mgm from the binaries available on dev.mysql.com, it just hangs causing huge load.
After a while there's a warning coming.

ndb_mgm> show
Warning, event thread startup failed, degraded printouts as result, errno=36

S2/D2, because basically MySQL Cluster doesn't really work on Mac right now.

Still problem in 7.0.9b (Mac OS X 10.6.2, _no_ MacPorts)
 shell> ndb_mgmd -f /path/to/config.ini
 shell> ndbd

When data node is started, doing:
 shell> ndb_mgm
 ndb_mgm> show

ndb_mgmd skyrocketing on CPU (I saw 190%).

Workaround: run ndb_mgmd with --nodaemon option.

See also #47214 (maybe they are related, or maybe 2 problems with similar effect).

W3 : Workaround available, not perfect, but good for testing.