Bug #52182 Heartbeat DB-DB should be ordered to avoid losing a nodegroup in some situations
Submitted: 18 Mar 2010 15:19 Modified: 7 Jul 2010 0:51
Reporter: Pekka Nousiainen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-5.1-telco-6.3 OS:Any
Assigned to: Pekka Nousiainen CPU Architecture:Any
Tags: Any version

[18 Mar 2010 15:19] Pekka Nousiainen
Description:
Background: DB nodes send circular heartbeat (HB) to each other.
Each node monitors the previous one.  If HB is missed, the node
declares the previous node dead globally.

It is possible for HB between nodes on different hosts to be too
slow, and yet the global declaration will succeed.  This can be
due to very low HB interval or a temporary connection problem.

In such situation the order in which HB is sent may cause an
unnecessary loss of a nodegroup (NG) and cluster crash.

For example, 2 hosts and 4 nodes:

host1 host2
A  -  B  (NG)
C  -  D  (NG)

Suppose the HB circle is A->B->C->D->A.  Then HB loss between the
hosts causes B to declare A dead and C to declare B dead, which
results in loss of NG A-B and cluster crash.

On the other hand HB circle A->B->D->C->A in same situation kills
A and D, and the cluster survives.

The bug: cluster should be able to order HB to avoid the loss
of a NG in this situation.

How to repeat:
Not easy.

There is old unmaintained transproxy.cpp which intercepts
all traffic and can be used to simulate network problems.

Suggested fix:
Order HB somewhere on startup and keep it ordered
on node failure and restart.
[12 Apr 2010 9:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105398

3177 Pekka Nousiainen	2010-04-12
      bug#52182 c01_hborder.diff
      configure HB order
[12 Apr 2010 9:49] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105399

3178 Pekka Nousiainen	2010-04-12
      bug#52182 c02_hborder.diff
      upgrade HB order
[12 Apr 2010 9:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105401

3179 Pekka Nousiainen	2010-04-12
      bug#52182 c03_hborder.diff
      after review
[7 Jun 2010 11:52] Jonas Oreland
comments
1) versions in ndb_version.h.in must "obviously change"
2) we should also check 7.1 versions
3) "if (x >= NDB_VERSION_D) return 1" does not work...since it's not added
   to highest GA version

otherwise ok to push
[16 Jun 2010 7:07] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111211

3217 Pekka Nousiainen	2010-06-16
      bug#52182 c01_hborder.diff
      configure HB order [re-commit]
[16 Jun 2010 7:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111212

3218 Pekka Nousiainen	2010-06-16
      bug#52182 c02_hborder.diff
      upgrade HB order [re-commit]
[16 Jun 2010 7:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111213

3219 Pekka Nousiainen	2010-06-16
      bug#52182 c03_hborder.diff
      after review [re-commit]
[16 Jun 2010 7:16] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111214

3220 Pekka Nousiainen	2010-06-16
      bug#52182 c04_hborder.diff
      update version numbers
[16 Jun 2010 19:57] Bugs System
Pushed into 5.1.44-ndb-7.0.16 (revid:pekka@mysql.com-20100616192912-b3wa72brovf5rycq) (version source revid:pekka@mysql.com-20100616165634-t4e9k32qlapnrl11) (merge vers: 5.1.44-ndb-7.0.16) (pib:16)
[16 Jun 2010 19:59] Bugs System
Pushed into 5.1.44-ndb-6.3.35 (revid:pekka@mysql.com-20100616071601-5m203v4ms6gz213t) (version source revid:pekka@mysql.com-20100616071601-5m203v4ms6gz213t) (merge vers: 5.1.44-ndb-6.3.35) (pib:16)
[17 Jun 2010 7:51] Pekka Nousiainen
Doc team:

Pushed to
mysql-5.1.44 ndb-6.3.35
mysql-5.1.44 ndb-7.0.16
mysql-5.1.44 ndb-7.1.5

To use this feature first upgrade to above versions
or higher and then set HeartbeatOrder values.

The HeartbeatOrder values must be either all zero
(the default normal case) or all non-zero and distinct
(this is the only useful alternative).  The values need
not be consecutive but could be for example 10, 20, ..

To effect the new order a full restart or 2 (two)
rolling restarts in same order is required.  The effect
can be seen in detail in ndb_*_out.log via DUMP 908.
[17 Jun 2010 11:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111422

3616 Martin Skold	2010-06-17 [merge]
      Merge
      modified:
        storage/ndb/include/mgmapi/mgmapi_config_parameters.h
        storage/ndb/include/ndb_version.h.in
        storage/ndb/src/common/debugger/EventLogger.cpp
        storage/ndb/src/kernel/blocks/dbdih/Dbdih.hpp
        storage/ndb/src/kernel/blocks/dbdih/DbdihMain.cpp
        storage/ndb/src/kernel/blocks/qmgr/Qmgr.hpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
        storage/ndb/src/mgmsrv/ConfigInfo.cpp
        storage/ndb/test/ndbapi/testNdbApi.cpp
        storage/ndb/test/ndbapi/test_event.cpp
        storage/ndb/test/src/HugoCalculator.cpp
[17 Jun 2010 11:48] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/111424

3220 Martin Skold	2010-06-17 [merge]
      Merge
      modified:
        storage/ndb/include/mgmapi/mgmapi_config_parameters.h
        storage/ndb/include/ndb_version.h.in
        storage/ndb/src/common/debugger/EventLogger.cpp
        storage/ndb/src/kernel/blocks/qmgr/Qmgr.hpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrInit.cpp
        storage/ndb/src/kernel/blocks/qmgr/QmgrMain.cpp
        storage/ndb/src/mgmsrv/ConfigInfo.cpp
        storage/ndb/test/ndbapi/testNdbApi.cpp
        storage/ndb/test/ndbapi/test_event.cpp
        storage/ndb/test/src/HugoCalculator.cpp
[7 Jul 2010 0:51] Jon Stephens
Documented feature addition in the NDB-6.3.35, 7.0.16, and 7.1.5 changelogs. 

For additions to documentation (including changelog entry), see 

http://lists.mysql.com/commits/112983