Bug #32025 ndb_waiter does too many roundtrips to ndb_mgmd
Submitted: 1 Nov 2007 10:30 Modified: 20 Feb 2008 21:53
Reporter: Magnus Blåudd Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version: OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any

[1 Nov 2007 10:30] Magnus Blåudd
Description:
1. The 'ndb_waiter' tool contacts 'ndb_mgmd' too many times to get the status of nodes in the cluster. Only one call to 'ndb_mgm_get_status' per loop should be enough.

2. There is also too much extra processing putting the status of nodes into vector's, that can be simplified a lot, just get the stats and process it directly.

3. There is also a variable _timeout covering the argument _timeout to function 'waitClusterStatus" - which one is used is unclear. Remove the function argument '_timeout' and use the _timeout variable directly - this is a small program!

How to repeat:
This was painfully noticed when running against a management server having bug#32023, it would take at least 8 seconds to wait for all ndb nodes to enter state NO_CONTACT.

Manual code inspection.

Suggested fix:
See above
[1 Nov 2007 13:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36870

ChangeSet@1.2540, 2007-11-01 14:32:15+01:00, msvensson@pilot.mysql.com +1 -0
  Bug#32025 ndb_waiter does too many roundtrips to ndb_mgmd
[2 Nov 2007 4:20] Stewart Smith
Looks fine.

Could possibly skip 5.0 though and go straight to 5.1 and telco.

To get rid of the Sleep(1), waiter could subscribe to NDB_MGM_EVENT_CATEGORY_CONNECTION events and have a ~1 sec timeout when waiting for one of them. (5.1+ only.. the timeout stuff doesn't exist in 5.0). This could help speed up some tests too.

If you're feeling really keen... I have an (old) patch that also implements waiting only for a particular node (quite useful for add node testing)... you're welcome to make that work too :)
[2 Nov 2007 8:21] Magnus Blåudd
Yes, using 'ndb_mgm_create_logevent_handle' is a great to avoid polling.
[7 Feb 2008 7:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/41847

ChangeSet@1.2540, 2008-02-07 08:08:43+01:00, msvensson@pilot.mysql.com +1 -0
  Bug#32025 ndb_waiter does too many roundtrips to ndb_mgmd
[12 Feb 2008 14:16] Jon Stephens
Documented in the 5.1.23-ndb-6.3.9 changelog as follows:

        The ndb_waiter utility polled ndb_mgmd too many times when
        obtaining the status of cluster data nodes.

Left in PQ status pending additional merges.
[12 Feb 2008 16:03] Jon Stephens
Also documented for 5.1.23-ndb-6.2.12.
[20 Feb 2008 16:03] Bugs System
Pushed into 5.1.24-rc
[20 Feb 2008 16:03] Bugs System
Pushed into 6.0.5-alpha
[20 Feb 2008 21:53] Jon Stephens
Also documented for 5.1.24 and 6.0.5.
[21 Feb 2008 12:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/42746

ChangeSet@1.2584, 2008-02-21 13:23:58+01:00, msvensson@pilot.mysql.com +3 -0
  Bug#32025 ndb_waiter does too many roundtrips to ndb_mgmd
   - fix test failures that was already there but now are
     more consistent when the 1 second sleep has been removed from
     ndb_waiter
[25 Feb 2008 15:58] Bugs System
Pushed into 5.1.24-rc
[25 Feb 2008 16:04] Bugs System
Pushed into 6.0.5-alpha
[25 Feb 2008 16:05] Bugs System
Pushed into 5.0.58