Bug #50433 Rolling upgrade to ndb-7.0.10 not possible ...
Submitted: 19 Jan 2010 8:41 Modified: 20 Jan 2010 9:32
Reporter: Hartmut Holzgraefe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-cluster-7.0.10 OS:Linux
Assigned to: Jonas Oreland CPU Architecture:Any

[19 Jan 2010 8:41] Hartmut Holzgraefe
Description:
When trying a rolling upgrade to ndb-7.0.10 on a two data node cluster the new 7.0.10 data begins to start up fine at first. At some point in start phase 5 the other node (still running the older version) suddenly fails with "Internal program error" though which brings the whole cluster down.

How to repeat:
* Create a two node, two replica cluster (using default values wherever possible) using ndb-7.0.9b
* Stop one data node
* Restart the data node using ndb-7.0.10
* See the remaining data node fail while the upgraded note is in start phase 

Tried with both ndbd and ndbmtd setups, used both regular and --initial node restarts, to no avail ...

Suggested fix:
Allow rolling upgrades to 7.0.10
[19 Jan 2010 8:53] Hartmut Holzgraefe
logs

Attachment: ndb_error_report_20100119094854.tar.bz2 (application/x-bzip, text), 202.69 KiB.

[19 Jan 2010 8:54] Hartmut Holzgraefe
Time: Tuesday 19 January 2010 - 09:48:12
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: dbdih/DbdihMain.cpp
Error object: DBDIH (Line: 12650) 0x0000000a
Program: /data1/mysql/ndb-7.0.9b/libexec/ndbmtd
Pid: 28230 thr: 0
Version: mysql-5.1.39 ndb-7.0.9b
Trace: /data2/csc/43804/cluster/ndb_3_trace.log.1 /data2/csc/43804/cluster/ndb_3_trace.log.1_t1 /
[19 Jan 2010 9:08] Hartmut Holzgraefe
the failing 7.0.9b nodes output log has

  2010-01-19 09:48:12 [ndbd] INFO     -- Fragment Replica(node=2) not found
  2010-01-19 09:48:12 [ndbd] INFO     -- ...And wasn't found in oldStoredReplicas
  2010-01-19 09:48:12 [ndbd] INFO     -- dbdih/DbdihMain.cpp
  2010-01-19 09:48:12 [ndbd] INFO     -- DBDIH (Line: 12650) 0x0000000a

and the code leading to the failure is

  void Dbdih::findReplica(ReplicaRecordPtr& replicaPtr,
                        Fragmentstore* fragPtrP,
                        Uint32 nodeId,
                        bool old)
  {
    replicaPtr.i = old ? fragPtrP->oldStoredReplicas : fragPtrP->storedReplicas;
    while(replicaPtr.i != RNIL){
      ptrCheckGuard(replicaPtr, creplicaFileSize, replicaRecord);
      if (replicaPtr.p->procNode == nodeId) { 
        jam();
        return;
      } else {
        jam();
        replicaPtr.i = replicaPtr.p->nextReplica;
      }//if
    };
    
  #ifdef VM_TRACE
    g_eventLogger->info("Fragment Replica(node=%d) not found", nodeId);
    replicaPtr.i = fragPtrP->oldStoredReplicas;
    while(replicaPtr.i != RNIL){
      ptrCheckGuard(replicaPtr, creplicaFileSize, replicaRecord);
      if (replicaPtr.p->procNode == nodeId) {
        jam();
        break;
      } else {
        jam();
        replicaPtr.i = replicaPtr.p->nextReplica;
      }//if
    };
    if(replicaPtr.i != RNIL){
      g_eventLogger->info("...But was found in oldStoredReplicas");
    } else {
      g_eventLogger->info("...And wasn't found in oldStoredReplicas");
    } 
  #endif
>   ndbrequire(false);
  }//Dbdih::findReplica()
[19 Jan 2010 13:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97381

3346 Jonas Oreland	2010-01-19
      ndb - bug#50433
        Upgrade from < 7.0.10 to 7.0.10 does n't work
          due to typo in ndb_version.h
[19 Jan 2010 13:28] Jonas Oreland
pushed to 7.0.11
[20 Jan 2010 9:32] Jon Stephens
Documented issue and bugfix in the NDB-7.0.11 changelog as follows:

        Online upgrades from MySQL Cluster NDB 7.0.9b to MySQL Cluster
        NDB 7.0.10 did not work correctly. Current MySQL Cluster NDB 7.0
        users should upgrade directly to MySQL Cluster NDB 7.0.11 or
        later.

Also updated mysql-cluster-upgrade-downgrade-compatibility section of docs and chart.

Closed.