Bug #58256 Rolling upgrade to 7.1.9 fails when ALL DUMP is used: cluster failure
Submitted: 17 Nov 2010 13:51 Modified: 9 Feb 2011 20:52
Reporter: Geert Vanderkelen Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:mysql-telco-7.1.9 OS:Any
Assigned to: Magnus Blåudd CPU Architecture:Any

[17 Nov 2010 13:51] Geert Vanderkelen
Description:
Upgrading from 7.1.8 to 7.1.9: first upgrading ndb_mgmd to 7.1.9 and doing an ALL DUMP, for example ALL DUMP 1000, crashes the data nodes which are still running in 7.1.8.

How to repeat:
Simple cluster configuration:

[ndbd(NDB)]	2 node(s)
id=2	@127.0.0.1  (mysql-5.1.47 ndb-7.1.8, Nodegroup: 0, Master)
id=3	@127.0.0.1  (mysql-5.1.47 ndb-7.1.8, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@127.0.0.1  (mysql-5.1.51 ndb-7.1.9)

[mysqld(API)]	12 node(s)
id=20	@127.0.0.1  (mysql-5.1.47 ndb-7.1.8)
id=21 (not connected, accepting connect from any host)
..

ndb_mgm> ALL DUMP 1000;
Sending dump signal with data:
0x000003e8 

Sending dump signal with data:
0x000003e8 

All data nodes go down:
 
ndb_mgm> Node 3: Forced node shutdown completed. Initiated by signal 6. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 2: Forced node shutdown completed. Initiated by signal 6. Caused by error 2301: 'Assertion(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

All data 

Status: Temporary error, restart node
Message: Assertion (Internal error, programming error or missing error message, please report a bug)
Error: 2301
Error data: Illegal signal received (GSN 610 not added)
Error object: Illegal signal received (GSN 610 not added)

Suggested fix:
Fix the SYNC_REQ(610) signal.

Not really a workaround: don't use ALL DUMP/ERROR during upgrade..
[17 Nov 2010 14:07] Geert Vanderkelen
Notes:
* ALL REPORT MEMORY USAGE works OK.
* Upgrading the Data Nodes to 7.1.9 and doing ALL DUMP 1000: works OK.

So it's advisable not to do any ALL DUMP/ERROR during rolling upgrade, until at least until data nodes are upgraded.
[17 Nov 2010 15:55] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124172
[17 Nov 2010 15:56] Magnus Blåudd
Bot 7.0.20 and 7.1.9 is affected.
[18 Nov 2010 11:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/124242

3946 MySQL Build Team	2010-11-18
      Bug#58256 Rolling upgrade to 7.1.9 fails when ALL DUMP is used:
                cluster failure
      
      - A new SYNC_REQ signal was added in 7.0.20 and 7.1.9 which is 
        used to make sure all blocks in a node has been scheduled 
        and thus processed the asynch commands it was sent. 
        The false assumption was that this is used for test/debug only 
        and thus no version code was needed.
      - To fix this, check the version of connected node and don't send
        SYNC_REQ if the node does not support it.
      
      (Patch done by Magnus Blaudd on 2010-11-17 in the telco-7.0 tree
      and sent via mail.)
      
      
      This is a rebuild of 7.1.9, the resulting version is 7.1.9a
[26 Nov 2010 18:41] Jon Stephens
Documented as follows in the NDB-7.0.20a and 7.1.9a changelogs:

        Issuing an ALL DUMP command during a rolling upgrade to MySQL
        Cluster NDB 7.0.20|7.1.9 caused the cluster to crash.

Added info to upgrades/downgrades section of Cluster chapter as well.

Closed.
[2 Feb 2011 14:44] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130243
[2 Feb 2011 14:53] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.22 (revid:magnus.blaudd@oracle.com-20110202144321-pzw8pl8uj3omedqj) (version source revid:magnus.blaudd@oracle.com-20110202144321-pzw8pl8uj3omedqj) (merge vers: 5.1.51-ndb-7.0.22) (pib:24)
[2 Feb 2011 16:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/130263

4062 MySQL Build Team	2011-02-02
      Bug#58256 Rolling upgrade to 7.1.9 fails when ALL DUMP is used:
      	  cluster failure
      
      - A new SYNC_REQ signal was added in 7.0.20 and 7.1.9 which is 
        used to make sure all blocks in a node has been scheduled 
        and thus processed the asynch commands it was sent. 
        The false assumption was that this is used for test/debug only 
        and thus no version code was needed.
      - To fix this, check the version of connected node and don't send
        SYNC_REQ if the node does not support it.
[9 Feb 2011 20:52] Jon Stephens
If I understand correctly, this fix did not make it into the 7.0.21 release, but does appear in 7.0.22 and 7.1.10. Made the necessary changes/additions in the changelog entries to reflect this situation.

Closed.