Bug #34438 | ndb_mgm process takes 100% cpu | ||
---|---|---|---|
Submitted: | 8 Feb 2008 23:39 | Modified: | 20 May 2009 13:27 |
Reporter: | Jeff Wang | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1-telco-6.3 | OS: | MacOS |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.1.24 |
[8 Feb 2008 23:39]
Jeff Wang
[1 Apr 2008 20:18]
Sveta Smirnova
Thank you for the report. Please provide your Cluster and mysqld configuration files.
[2 Apr 2008 18:05]
Sveta Smirnova
Please also provide command you use when start ndb_mgm.
[9 Apr 2008 23:13]
Jeff Wang
The command I used in the mgm shell is "show" or "1 status" or "2 status". ---Master config file--- # Options affecting ndbd processes on all data nodes: [ndbd default] NoOfReplicas=2 # Number of replicas DataMemory=6000M # How much memory to allocate for data storage IndexMemory=1000M # How much memory to allocate for index storage StringMemory=10000000 #expressed as a percentage, 100%=5 MB, values > 99 interpreted as bytes #increase this number if the number of inserts/updates/delete is large. #Each Log file is 4 * 16 MB = 64 MB. So 1024 Log files can accomodate 512*64 = 32 GB #There should be enough log files to accommdate at least the time to do 6 LCPs. #Thus, if each LCP is 300 seconds, we need to support 6*300 seconds=1800 seconds of REDO Logs. #So, if you are writing 10 MB/second of insert/update/del info, you need to have 18 GB of Redo logs. NoOfFragmentLogFiles=96 #FragmentLogFileSize=96M #the following params control LCP speed. A Local Checkpoint is how often the contents of #Data Memory are flushed to diskThe default values should go good enough for #a machine w/ 2 GB RAM. LCPs should occur at ~ 5 minute intervals leading to a 2-3 minute node restart time. #These parameters need to be adjusted for machines with larger RAM. #A rough estimatation for CheckpointSpeed is DataMemory/(seconds to checkpoint) (ie: 2000 MB/300 seconds = 6.8 MB/sec) DiskSyncSize=4M DiskCheckpointSpeed=10M DiskCheckpointSpeedInRestart=100M RedoBuffer=32M #Each attribute used 200 bytes of storarge/node #Should be at least 3 times to size of all attributes you expect because Alter table statement use them #Also take into account attributes in hidden tables (ie: unique index table, ordered index, blob tables, index trigger) MaxNoOfAttributes=16000 #should be at least 2 times number of expected tables. A hidden table is created for each ordered index #Each table object consumes 20KB/node MaxNoOfTables=8000 #should be at least 2 times number of expected tables. 2 indexes created for each unique index (hash + ordered) #Each index uses 10KB/node MaxNoOfOrderedIndexes=8000 #amount of time to elapse before aborting the transaction #and assuming deadlock on other node. TransactionDeadlockDetectionTimeout=15000 #in ms #amount of time between operations in the same transaction #0 indicates no timeout #units in ms #TransactionInactiveTimeout=0 #number of simultaneous updates (or selects using locks) that occur at once. #Lookups on unique indexes require 2 records (due to a look up in a hidden index table), blobs do as well (?). #number should be maxNumSimulataneousUpdates/ # nodes. #Each record requires 1 KB so what your memory usage. MaxNoOfConcurrentOperations=50000 #Logging, values can be 0 to 15 where 15 is the most verbose LogLevelCheckpoint=10 LogLevelCongestion=10 LogLevelConnection=10 LogLevelError=10 LogLevelInfo=10 LogLevelNodeRestart=10 LogLevelShutdown=10 LogLevelStartup=10 LogLevelStatistic=10 #MemReportFrequency=0 # TCP/IP options: [tcp default] SendBufferMemory=2M ReceiveBufferMemory=1M Checksum=1 #detect corrupted messages # Management process options: [ndb_mgmd] id=1 hostname=x.x.com # Hostname or IP address of MGM node datadir=/Users/x/work/cluster/mgm_data # Directory for MGM node log files # Options for data node "A": [ndbd] id=2 hostname=x.x.com datadir=/Users/x/work/cluster/data # Directory for this data node's data files # Options for data node "B": [ndbd] id=3 #hostname=x.x.com # Hostname or IP address of MGM node hostname=x.x.com datadir=/Users/x/work/cluster/data # Directory for this data node's data files # SQL node options: [mysqld] id=4 [mysqld] id=5 [mysqld] [mysqld]
[10 Apr 2008 22:19]
Sveta Smirnova
Thank you for the feedback. As 'command you use when start ndb_mgm' I meant which options do you provide to ndb_mgm. Like `bin/ndb_mgm --ndb-mgmd-host=127.0.0.1:35118`.
[11 Apr 2008 17:48]
Jeff Wang
Hello, I don't provide any option on the command line. So, I just use 'ndb_mgm' to start the shell. This reads the my.cnf file which only has this: [ndb_mgm] ndb-connectstring=127.0.0.1:1186 Also, when starting the daemon, I do not provide any options. So I just used 'ndb_mgmd' and the only option in the my.cnf file is the absolute path to the config file (which I provided above). thanks
[20 May 2009 13:27]
Geert Vanderkelen
Verified, this is indeed a problem on Mac. ndb_mgm is unusable. MySQL Cluster 7.0.5 has same problem. Running ndb_mgm from the binaries available on dev.mysql.com, it just hangs causing huge load. After a while there's a warning coming. ndb_mgm> show Warning, event thread startup failed, degraded printouts as result, errno=36 S2/D2, because basically MySQL Cluster doesn't really work on Mac right now.
[16 Nov 2009 12:46]
Geert Vanderkelen
Still problem in 7.0.9b (Mac OS X 10.6.2, _no_ MacPorts) shell> ndb_mgmd -f /path/to/config.ini shell> ndbd When data node is started, doing: shell> ndb_mgm ndb_mgm> show ndb_mgmd skyrocketing on CPU (I saw 190%). Workaround: run ndb_mgmd with --nodaemon option. See also #47214 (maybe they are related, or maybe 2 problems with similar effect). W3 : Workaround available, not perfect, but good for testing.