Bug #19619 Cluster can not extend NoOfFragmentLogFiles w/o initial the file system
Submitted: 8 May 2006 16:32 Modified: 13 May 2006 2:02
Reporter: Jonathan Miller Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S2 (Serious)
Version:5.1.11 OS:Linux (Linux 32 Bit OS)
Assigned to: Jon Stephens CPU Architecture:Any

[8 May 2006 16:32] Jonathan Miller
Description:
I was trying to extend NoOfFragmentLogFiles from 50 - 150. On data node restart it did a force shutdown with the following error:

Time: Monday 8 May 2006 - 18:14:03
Status: Ndbd file system error, restart node initial
Message: File not found (Ndbd file system inconsistency error, please report a bug)
Error: 2815
Error data: DBLQH: File system open failed. OS errno: 2
Error object: DBLQH (Line: 1827) 0x0000000a
Program: /home/ndbdev/jmiller/builds/libexec/ndbd
Pid: 25748
Trace: /space/run/ndb_2_trace.log.1
Version: Version 5.1.10 (beta)

Notice this wants a restart with initial due to file system error, but the only error is that it is expecting there to be 150 files now and it only finds 50. The program should be smart enough to extend and create the missing 100 files. 

So here is the way it currently work. The user reads the docs and he figures he needs to extend NoOfFragmentLogFiles. The documents don't tell the customer which values need a restart, or a restart -i, so the customer makes changes to his config.ini file and to be safe restarts the managment server, and then begins to restart data nodes, but the restart fails and the data nodes crash because the customer did not know that he needed to include an -i in the restart command. Probably does not even know what the -i or the -n options are since they are not documented by the help function.

Quickest fix would be to correct the documents and give a table of setting that require a -i or the other option is to make the cluster smart enough to auto extend once it runs out of current NoOfFragmentLogFiles until it get to the new number.

How to repeat:
See above

Suggested fix:
Two suggestions above (:-o)
[9 May 2006 8:58] Jonas Oreland
Changing the implemented behavior in this area is very time consuming.
And my personal idea on how this should be accomplished is to have SQL commands
  (like alter logfile group add undofile, but instead for redofile),

But it should documented that in order to change NoOfFragmentLogFiles
one need to restart the node initial.

So if one wants to do it for entire cluster, on can do a rolling node restart.

Therefor moving this to docs...
[13 May 2006 0:46] Jon Stephens
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

Additional info:

Updated description of parameter in Manual as suggested.
[13 May 2006 1:05] Jonathan Miller
emphasis role="bold">Important</emphasis>: This parameter
+              cannot be changed <quote>on the fly</quote>; you must
+              restart the node using <option>--initial</option>. If you
+              iwsh to change this value for a running cluster, you can
+              do so via a rolling node restart.

iwsh should be "wish"

Also the commands in the ndb_mgm for restart is "<id> restart" This is where most will go to do rolling restarts. The option there is -i which should be documented.

In addtion, there is another option -n that I have not idea of what it does.

Moreover, we should have a table of what vaule changes need restart ( I would think all of them) and which need restarts with -i or --initial.

Thanks
/jeb
[13 May 2006 2:02] Jon Stephens
Thank you for your bug report. This issue has been addressed in the
documentation. The updated documentation will appear on our website
shortly, and will be included in the next release of the relevant
product(s).

Additional info:

Fixed typo. Other commentary is not relevant to *this* bug, we'll discuss via other channels, please don't reopen again. Thanks!