Bug #24447 mysqld/ndbapi disconnect may cause datanode shutdown (failed ndbrequire)
Submitted: 20 Nov 2006 22:03 Modified: 22 Dec 2006 3:47
Reporter: Craig Forbes Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:4.1, 5.0, 5.1 OS:Linux (Linux RHEL)
Assigned to: Jonas Oreland
Tags: cluster, failed ndbrequire, ndbd

[20 Nov 2006 22:03] Craig Forbes
Description:
This may be 2 bugs.

On a 2 datanode cluster one node dies with the following error:

Time: x 19 November 2006 - 10:23:46
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 9602) 0x0000000a
Program: /u01/mysql/libexec/ndbd
Pid: 332
Trace: /u01/mysql/cluster/ndb_22_trace.log.3
Version: Version 5.0.22

In the cluster log I see:
2006-11-19 10:23:46 [MgmSrvr] ALERT    -- Node 22: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal 
error, programming error or missing error message, please report a bug). Temporary error, restart node'.
2006-11-19 10:23:47 [MgmSrvr] INFO     -- Mgmt server state: nodeid 12 reserved for ip 192.168.x.x, m_reserved_nodes 0000000000001c02.

After the datanode has shutdown the cluster continues to operation but the local checkpoints stop.  Since the checkpoints have stopped, when the failed node is restarted it hangs in phase 5 (after the phase 4 complete message) waiting for the running node to checkpoint before joining the cluster.

At this point the only solution is to shutdown the cluster and restart both data nodes.

How to repeat:
Unable to duplicate, but the problem has occurred more than once.
[21 Nov 2006 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15625

ChangeSet@1.2557, 2006-11-21 14:04:20+01:00, jonas@perch.ndb.mysql.com +3 -0
  ndb - bug#24447
    api disconnect just after SCAN_TABREQ
[21 Nov 2006 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15631

ChangeSet@1.2558, 2006-11-21 14:06:20+01:00, jonas@perch.ndb.mysql.com +1 -0
  ndb -
    update error code list
    (for bug#24447)
[21 Dec 2006 10:08] Tomas Ulin
changed titel to reflect actual issue
[22 Dec 2006 3:47] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix for 5.0.32 and 5.1.14.
[4 Jan 2007 11:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17617
[15 Apr 2007 17:02] Bugs System
Pushed into 4.1.23