Bug #92825 ndb_import docs and option parsing are not in sync
Submitted: 17 Oct 2018 11:57 Modified: 22 Nov 2018 13:53
Reporter: Daniël van Eeden (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Documentation Severity:S3 (Non-critical)
Version:7.6+ OS:Any
Assigned to: Jon Stephens CPU Architecture:Any

[17 Oct 2018 11:57] Daniël van Eeden
Description:
page: https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-programs-ndb-import.html

--verbose and -v are on the docs, but my binary has --log-level instead
--input-workers=1 is not accepted while the docs say that's the minimum
--rejects and docs for a few other options seem to suggest that ndb_import should stop on the first reject, but that's not the case.

How to repeat:
Compare options of ndb_import with docs
[17 Oct 2018 14:06] Daniël van Eeden
Looks like the defaults for --db-workers differ between docs and help output
[17 Oct 2018 14:11] MySQL Verification Team
Hi,

Yes, documentation is not fully up to date, thanks for the report.

Bogdan
[21 Nov 2018 14:46] Jon Stephens
Note: --verbose is accepted syntax in both 7.6 and 8.0 binaries. If this is a valid option, that means it should be listed in the help output; if it's not, then it should be rejected. Either way, this particular item is a software bug.
[21 Nov 2018 15:02] Jon Stephens
Regarding --input-workers: ndb_import.cpp has

  { "input-workers", NDB_OPT_NOSHORT,
    "Number of threads processing input",
    &g_opt.m_input_workers, &g_opt.m_input_workers, 0,
    GET_UINT, REQUIRED_ARG, g_opt.m_input_workers, 0, 0, 0, 0, 0 },

NdbImport.cpp has

    if (opt.m_input_workers < 1)
    {
      util.set_error_usage(util.c_error, __LINE__,
                           "number of input workers must be >= 1");
      return -1;
    }

AND

    if (strcmp(opt.m_input_type, "csv") == 0 &&
        opt.m_input_workers < 2)
    {
      util.set_error_usage(util.c_error, __LINE__,
                           "number of csv input workers must be >= 2");
      return -1;

So unless this is the error being encountered, I think this must be a code bug somewhere. Otherwise it's expected behaviour.

BTW, the fact that many of the defaults are hard-coded in NdbImport::Opt::Opt() and so override the values set in my_long_options[] in ndb_import.cpp is also IMO a huge bug.
[21 Nov 2018 15:17] Jon Stephens
And now for --db-workers: 

ndb_import.cpp:

  { "db-workers", NDB_OPT_NOSHORT,
    "Number of threads PER datanode executing db ops",
    &g_opt.m_db_workers, &g_opt.m_db_workers, 0,
    GET_UINT, REQUIRED_ARG, g_opt.m_db_workers, 0, 0, 0, 0, 0 },

NdbImport::Opt::Opt()

  m_db_workers = 4;

NdbImport::set_opt():

    if (strcmp(opt.m_output_type, "ndb") == 0 &&
        opt.m_db_workers < 1)
    {
      util.set_error_usage(util.c_error, __LINE__,
                           "number of db workers must be >= 1");
      return -1;
    }

Since the help output shows this as 4, I've updated the default value shown in the docs.
[21 Nov 2018 17:29] Jon Stephens
According to developer notes, --verbose enables (extra) user output to stdout.
[21 Nov 2018 19:38] Jon Stephens
I think I've fixed/otherwise attended to all the outstanding issues relating to ndb_import options in mysqldoc rev 60096.

I'll leave this In-Progress while I look for anything else I can address.
[22 Nov 2018 13:53] Jon Stephens
{ "rejects", NDB_OPT_NOSHORT,
    "Limit number of rejected rows (rows with permanent error) in data load."
    " Default is 0 which means that any rejected row causes a fatal error."
    " The row(s) exceeding the limit are also added to *.rej."
    " The limit is per current run (not all --resume'd runs)",
    &g_opt.m_rejects, &g_opt.m_rejects, 0,
    GET_UINT, REQUIRED_ARG, g_opt.m_rejects, 0, 0, 0, 0, 0 },

NdbImport::Opt::Opt()
{
  m_connections = 1;
  m_database = 0;
  m_state_dir = ".";

...
  m_checkloop = 100;
  m_alloc_chunk = 20;
  m_rejects = 0;
...
}

I'm not sure I really understand what's being asked for with regard to --rejects; the code and comments therein seem pretty clear about what the default value (0) and behaviour should be. I've revised the description for this option and tried to eliminate any ambiguity that shouldn't be there.

Closed.