Bug #14828 ALL Nodes Crash. Forced node shutdown completed. Initiated by signal 0. Caused
Submitted: 10 Nov 2005 14:27 Modified: 15 Dec 2005 11:39
Reporter: Eric duda Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.0.15 OS:Linux (fedora core 4)
Assigned to: Jonas Oreland CPU Architecture:Any

[10 Nov 2005 14:27] Eric duda
Description:

seems to happen shortly after starting up 3 api nodes. 

setup:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.100.100.233  (Version: 5.0.15, Nodegroup: 0, Master)
id=3    @192.100.100.234  (Version: 5.0.15, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.100.100.221  (Version: 5.0.15)

[mysqld(API)]   3 node(s)
id=4 (not connected, accepting connect from 192.100.100.233)
id=5 (not connected, accepting connect from 192.100.100.234)
id=6 (not connected, accepting connect from 192.100.100.221)

ndb_mgm> Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_mgm> all status
Node 2: not connected
Node 3: not connected

Error log:

Current byte-offset of file-pointer is: 568                       

Time: Thursday 10 November 2005 - 08:18:04
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: SimulatedBlock.cpp
Error object: DBTC (Line: 1893) 0x0000000a
Program: ndbd
Pid: 30112
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.1
Version: Version 5.0.15
***EOM***

config.ini:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=1824M
IndexMemory=400M
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Management Server
[NDB_MGMD]
HostName=192.100.100.221           # IP address of this server
# Storage Nodes
[NDBD]
HostName=192.100.100.233           # IP address of storage-node-1
DataDir=/var/lib/mysql-cluster
BackupDataDir=/mysqlbackup
NoOfFragmentLogFiles=32
[NDBD]
HostName=192.100.100.234           # IP address of storage-node-2
DataDir=/var/lib/mysql-cluster
BackupDataDir=/mysqlbackup
NoOfFragmentLogFiles=32
# Setup node IDs for mySQL API-servers (clients of the cluster)
[MYSQLD]
HostName=192.100.100.233
[MYSQLD]
HostName=192.100.100.234
[MYSQLD]
HostName=192.100.100.221

rpms installed from mysql web site:

MySQL-server-5.0.15-0.glibc23
MySQL-ndb-storage-5.0.15-0.glibc23
MySQL-client-5.0.15-0.glibc23
MySQL-ndb-management-5.0.15-0.glibc23
MySQL-Max-5.0.15-0.glibc23

How to repeat:
it keeps doing this, everytime i start up the nodes and mysql servers. i can't get the database back online at all anymore.
[10 Nov 2005 14:32] Eric duda
trace file from crash

Attachment: trace.zip (application/x-zip-compressed, text), 49.23 KiB.

[10 Nov 2005 14:42] Eric duda
/etc/my.cnf
 
[mysqld]
default-table-type=NDBCLUSTER
ndbcluster
ndb-connectstring='host=192.100.100.221'    # IP address of the management server
[mysql_cluster]
ndb-connectstring='host=192.100.100.221'    # IP address of the management server
[10 Nov 2005 17:06] Eric duda
Now i'm also getting this error:

Time: Thursday 10 November 2005 - 10:52:57
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DblqhMain.cpp

i didn't have any api nodes running at the time of this crash.
Error object: DBLQH (Line: 16138) 0x0000000a
Program: ndbd
Pid: 3701
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.6
Version: Version 5.0.15
***EOM***
[17 Nov 2005 14:03] Tomas Ulin
did you do any configuration change?  If so tell us what.
did you upgrade and keep the filesystem?  If so tell us how.

please supply all logs
[17 Nov 2005 14:11] Eric duda
No configuation changes. I installed the new 5.0 rpms and used mysql client to load all my tables and data that i have saved via mysqldump. After upgrade complete, within mins, ndbd start crashing.

No OS filesystem changes.
[17 Nov 2005 14:13] Eric duda
Also, since this was many days go, i have since deleted everything and started over, this time not using clustering anymore. I am using replication and it's working fine.
[17 Nov 2005 14:29] Tomas Ulin
can you provide us with a dump that reproduces this?
[17 Nov 2005 16:04] Eric duda
Sorry, can't provide dump as it's customer data, but this table might of been causing it.

CREATE TABLE `email` (
  `order_num` varchar(12) NOT NULL default '',
  `email` varchar(50) NOT NULL default '',
  `mail_message` text,
  `order_date` datetime default NULL,
  PRIMARY KEY  (`order_num`),
  KEY `NewIndex` (`email`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

it had about 200000 rows in it.
[14 Dec 2005 8:30] Jonas Oreland
This is the same as #15682
Which will be fixed shortly
[14 Dec 2005 13:24] Jonas Oreland
Pushed into 5.0.18
[15 Dec 2005 9:54] Jonas Oreland
yes,
also pushed into 5.0.17
pushed into 5.1.4 (i think, someone else merged....)
[15 Dec 2005 11:39] Jon Stephens
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented in 5.0.18 and 5.1.4 changelogs. Closed.
[28 Nov 2006 15:39] Lars Bo Svenningsen
I see the same problem in 5.1.11 when I try to restart the master from ndb_mgm.

-- NDB Cluster -- Management Client --
ndb_mgm> 2 restart
Connected to Management Server at: ds-lvs02:1186
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting, no start.
Node 2 is being restarted

ndb_mgm> Node 2: Start initiated (version 5.1.11)
Node 2: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error

After this, the state of node 2 is 'not connected'.
OS: Debian Etch (2.6.17-2-686)