Bug #43709 | Node will not restart after upgrade | ||
---|---|---|---|
Submitted: | 17 Mar 2009 17:09 | Modified: | 4 Sep 2009 10:51 |
Reporter: | Phil Bayfield | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
Version: | mysql-5.1-telco-6.3 | OS: | Linux |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
Tags: | mysql-5.1-telco-6.3.23 |
[17 Mar 2009 17:09]
Phil Bayfield
[17 Mar 2009 17:45]
Phil Bayfield
Here is my config.ini [TCP DEFAULT] SendBufferMemory=2M ReceiveBufferMemory=2M [NDB_MGMD DEFAULT] PortNumber=1186 Datadir=/var/lib/mysql-cluster [NDB_MGMD] Id=1 Hostname=192.168.10.2 [NDB_MGMD] Id=2 Hostname=192.168.10.3 [NDBD DEFAULT] NoOfReplicas=2 Datadir=/var/lib/mysql-cluster DataMemory=3584M IndexMemory=256M LockPagesInMainMemory=1 MaxNoOfConcurrentOperations=200000 StringMemory=25 MaxNoOfTables=4096 MaxNoOfOrderedIndexes=2048 MaxNoOfUniqueHashIndexes=512 MaxNoOfAttributes=24576 DiskCheckpointSpeedInRestart=100M FragmentLogFileSize=256M NoOfFragmentLogFiles=24 RedoBuffer=32M TimeBetweenLocalCheckpoints=20 TimeBetweenGlobalCheckpoints=1000 TimeBetweenEpochs=100 MemReportFrequency=3600 BackupReportFrequency=2 ### Realtime Extensions RealTimeScheduler=1 SchedulerExecutionTimer=80 SchedulerSpinTimer=40 ### Timeouts TransactionDeadlockDetectionTimeout=5000 HeartbeatIntervalDbDb=3000 HeartbeatIntervalDbApi=3000 ### Disk data related DiskPageBufferMemory=64M SharedGlobalMemory=256M [NDBD] Id=3 Hostname=192.168.10.6 LockExecuteThreadToCPU=2 LockMaintThreadsToCPU=3 [NDBD] Id=4 Hostname=192.168.10.7 LockExecuteThreadToCPU=2 LockMaintThreadsToCPU=3 [MYSQLD] Hostname=192.168.10.4 [MYSQLD] Hostname=192.168.10.5 [MYSQLD] Hostname=192.168.10.4 [MYSQLD] Hostname=192.168.10.6 [MYSQLD] Hostname=192.168.10.7
[18 Mar 2009 1:41]
Phil Bayfield
I have managed to recover the node but only by performing a complete cluster shutdown and restart causing over 90 minutes of downtime to my websites and apps. The purge stale sessions command had no effect in recovering the stale connection that appeared to be stopping node 4 from starting. I restarted all nodes except node 3 (the working NDBD node) to no avail. It appeared that the stale connection was something to do with the working NDB node, after restarting it the problem was resolved.
[18 Mar 2009 7:48]
Sveta Smirnova
Thank you for the report. What is the version number you upgrade from?
[18 Mar 2009 11:48]
Phil Bayfield
I upgraded from 6.3.21
[19 Mar 2009 8:57]
Bernd Ocklin
Could be maybe duplicate of http://bugs.mysql.com/bug.php?id=43224. Can you please upload all your logs? mysql_error_reporter is a good tool to collect all logs. Did you try to reproduce this?
[19 Mar 2009 11:03]
Phil Bayfield
This is the management server log (same messages repeated): 2009-03-17 17:04:25 [MgmSrvr] WARNING -- Allocate nodeid (0) failed. Connection from ip 192.168.10.7. Returned error string "No free node id found for ndbd(NDB)." 2009-03-17 17:04:25 [MgmSrvr] INFO -- Mgmt server state: node id's 2 3 5 6 7 connected but not reserved Or when I've tried to start ndbd with --ndb-nodeid=4 it would display Allocate nodeid (4) failed. I've got multiple logs on the node itself as well which have the following: Current byte-offset of file-pointer is: 568 Time: Tuesday 17 March 2009 - 23:17:25 Status: Permanent error, external action needed Message: Invalid configuration received from Management Server (Configuration error) Error: 2350 Error data: Unable to alloc node id Error object: Error : Could not alloc node id at 192.168.10.2 port 1186: No free node id found for ndbd(NDB). Program: ndbd Pid: 8368 Trace: <no tracefile> Version: mysql-5.1.32 ndb-6.3.23-GA ***EOM*** I dont have the mysql_error_reporter but these are all I've found searching the logs manually. My issue does look very much like http://bugs.mysql.com/bug.php?id=43224
[25 Mar 2009 10:29]
Jonathan Miller
Please try version 6.3.24 and see if this address issue.
[25 Mar 2009 10:32]
Phil Bayfield
Is there a release date for 6.3.24?
[10 Aug 2009 12:26]
Jonas Oreland
6.3.24 (and 25) has been release, please retest
[4 Sep 2009 10:32]
Phil Bayfield
Yep, this was fixed with 6.3.24 :)