Bug #44427 | NDB_MGMD does not reconnect to cluster following configuration change | ||
---|---|---|---|
Submitted: | 23 Apr 2009 10:50 | Modified: | 30 Apr 2009 7:29 |
Reporter: | Phil Bayfield | Email Updates: | |
Status: | Won't fix | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 6.3.24 | OS: | Linux |
Assigned to: | Magnus Blåudd | CPU Architecture: | Any |
Tags: | ndb_mgmd |
[23 Apr 2009 10:50]
Phil Bayfield
[28 Apr 2009 8:35]
Magnus Blåudd
One new feature in 7.0 is that the ndb_mgmd's always have the same configuration. Both of the two ndb_mgmd's need to be started in order to commit a new configuration version(you can see in their log files, that they are "waiting for other nodes").
[29 Apr 2009 13:42]
Phil Bayfield
All the other nodes were still running, however the management servers reported otherwise. NDBD and MYSQLD nodes continued to function as expected. Following reversal of the configuration change the NDB_MGMD nodes reconnected with the other nodes and showed all nodes connected.
[29 Apr 2009 17:51]
Magnus Blåudd
Ok, I see you are using 6.3.24 which does not have the new features of 7.0. Sorry! Have you restarted both management servers with same config.ini? And all nodes one by one. If you remove nodes in a way that cause nodeid's to change, I think you can see this kind of problems, recommend using fixed nodeid's at least for ndbd(s) and ndb_mgmd(s).
[29 Apr 2009 17:54]
Magnus Blåudd
Don't think any of my advices are good... the mysqld's in your config all have higher nodeid's then ndbd and ndb_mgmd. Have you checked cluster log(s) from both ndb_mgmd's, maybe they can give you a hint. Otherwise upload old and new config.ini and we can take a look. But it's not a big problem for you running with too many mysqld(s) I hope?
[29 Apr 2009 18:16]
Phil Bayfield
Hi Magnus, Basically what happened was I was looking at Johan Andersson's configurator (http://www.severalnines.com/config/index.php) to see what changes it came out with for the 7.0 series. I noticed the bit about connection pooling and had a read up on this and found that it may improve performance. The configurator suggested using 5 separate API slots per MySQL server so I modified my existing 6.3 config to this extent. I modified the config's, restarted the data nodes and finally the MySQL servers and it all worked fine. Then through some further reading in the manual I noticed it said we shouldn't have more connections that processors/process cores or it could considerably degrade performance. I then decreased the MySQL server's to 3 connections and restarted them (obviously no problem they reconnected to the cluster with 3 connections instead of 5). I was attempting to apply the changes to the management nodes in a similar fashion, change the configs and restart both servers. (I realise there is no harm in having the empty slots there.) After restarting the management nodes, my first reaction was that the cluster had crashed as I've never seen this happen before, even with a config change the management nodes had immediately reconnected to the other nodes. I checked the cluster logs and it simply showed the normal startup messages for the management node. I then checked my sites and they were still up, then checked the other nodes in the cluster and they were also still up. At this point I was somewhat confused at to what was going on, I went through all the logs and saw nothing, no error messages etc. I figured I would have to shut the cluster down but had no idea if it was even possible without the management server running! On the off chance I reverted the config on both management servers back to their original settings and restarted them and then everything reappeared as it should be prior to the config change. I've since done a full restart of the cluster for another reason and made the config changes in the process so I no longer have any issue. The main reason for reporting the bug was so that others could avoid possibly taking more drastic action unnesecarily.
[30 Apr 2009 7:29]
Magnus Blåudd
Thanks for that nice explanation. I interpret it as when you restarted the second ndb_mgmd the problem disappeared. Could do some experiments, but I'd rather focus on 7.0 where we have added new functionality to make sure that both management servers always use the same configuration and reconfigure themselves without need to restart. This concept will then be applied to ndbapi/mysqld as well as the ndbd nodes - although not everything will be possible to reconfigure without restart, atleast the process should know that it's not running with latest configuration.