Bug #26293 cluster mgmt node sometimes doesn't receive events from all nodes on restart
Submitted: 12 Feb 2007 18:28 Modified: 26 Feb 2007 3:17
Reporter: Hartmut Holzgraefe
Status: Closed
Category:Server: Cluster Severity:S3 (Non-critical)
Version:5.1.14-ndb-6.1.0 OS:Linux (linux)
Assigned to: Tomas Ulin Target Version:

[12 Feb 2007 18:28] Hartmut Holzgraefe
Description:
Sometimes when a management node is restarted all data nodes connect to it (logged in the
cluster log and visible using netstat) but some nodes do not actually log events to the
management node. A 2nd management node logs events from all nodes just fine at the same
time. Restarting a data node seems to resolve the situation once the node is down.

How to repeat:
will be added soon ...
[12 Feb 2007 19:18] Hartmut Holzgraefe
additional information: stopping a data node does seem to resolve this, 
staring with the

  Node x: Node shutdown completed.

INFO message. After this all nodes log to the management node just fine.
[13 Feb 2007 5:49] Tomas Ulin
patch for bug

Attachment: bug26293.patch (text/x-patch), 5.35 KiB.

[13 Feb 2007 7:36] Tomas Ulin
new patch against 5.0, with fixes also for other send signal

Attachment: bug26293_5.0_2.patch (text/x-patch), 6.91 KiB.

[13 Feb 2007 22:00] Jonas Oreland
review.
1) okToSend (unCond = true, should check m_api_regconf)
            (unCond = false, should check alive)

   this needs to be fixed...

2) the only mgm function that I *know* needs unCond=false(i.e alive) is backup

3) please install diff-p helper, so that diff gets easier to read.

4) the state stuff is very very bad for a user (such as ndb_mgmd)
   and the interface you proposed instead sounded very good...
   we should document it somewhere so we dont forget it..
   (maybe comment in code)

5) please add an assert in TransporterFacade that
   checks that only GSN_APIREGREQ is allowed to be sent if m_api_regconf = false

   this assert would find this bug directly...

/Jonas
[14 Feb 2007 4:01] Tomas Ulin
yet another patch

Attachment: bug26293_5.0_3.patch (text/x-patch), 7.23 KiB.

[14 Feb 2007 5:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19822
[14 Feb 2007 8:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19842
[21 Feb 2007 16:09] Tomas Ulin
5.0.37, 5.1.16, ndb-6.1.3
[26 Feb 2007 3:17] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of
that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available
version, including the bug fix. More information about accessing the source trees is
available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix in 5.0.38, 5.1.16, and ndb-6.1.3 changelogs.