Bug #24447 mysqld/ndbapi disconnect may cause datanode shutdown (failed ndbrequire)
Submitted: 20 Nov 2006 23:03 Modified: 22 Dec 2006 4:47
Reporter: Craig Forbes
Status: Closed
Category:Server: Cluster Severity:S2 (Serious)
Version:4.1, 5.0, 5.1 OS:Linux (Linux RHEL)
Assigned to: Jonas Oreland Target Version:
Tags: failed ndbrequire, ndbd, cluster

[20 Nov 2006 23:03] Craig Forbes
Description:
This may be 2 bugs.

On a 2 datanode cluster one node dies with the following error:

Time: x 19 November 2006 - 10:23:46
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or
missing error message, please report a bug)
Error: 2341
Error data: DbtcMain.cpp
Error object: DBTC (Line: 9602) 0x0000000a
Program: /u01/mysql/libexec/ndbd
Pid: 332
Trace: /u01/mysql/cluster/ndb_22_trace.log.3
Version: Version 5.0.22

In the cluster log I see:
2006-11-19 10:23:46 [MgmSrvr] ALERT    -- Node 22: Forced node shutdown completed.
Initiated by signal 0. Caused by error 2341: 'Internal program error (failed
ndbrequire)(Internal 
error, programming error or missing error message, please report a bug). Temporary error,
restart node'.
2006-11-19 10:23:47 [MgmSrvr] INFO     -- Mgmt server state: nodeid 12 reserved for ip
192.168.x.x, m_reserved_nodes 0000000000001c02.

After the datanode has shutdown the cluster continues to operation but the local
checkpoints stop.  Since the checkpoints have stopped, when the failed node is restarted
it hangs in phase 5 (after the phase 4 complete message) waiting for the running node to
checkpoint before joining the cluster.

At this point the only solution is to shutdown the cluster and restart both data nodes.

How to repeat:
Unable to duplicate, but the problem has occurred more than once.
[21 Nov 2006 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15625

ChangeSet@1.2557, 2006-11-21 14:04:20+01:00, jonas@perch.ndb.mysql.com +3 -0
  ndb - bug#24447
    api disconnect just after SCAN_TABREQ
[21 Nov 2006 14:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/15631

ChangeSet@1.2558, 2006-11-21 14:06:20+01:00, jonas@perch.ndb.mysql.com +1 -0
  ndb -
    update error code list
    (for bug#24447)
[21 Dec 2006 11:08] Tomas Ulin
changed titel to reflect actual issue
[22 Dec 2006 4:47] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of
that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version,
including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented bugfix for 5.0.32 and 5.1.14.
[4 Jan 2007 12:15] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/17617
[15 Apr 2007 19:02] Bugs System
Pushed into 4.1.23