MySQL Bugs: #14828: ALL Nodes Crash. Forced node shutdown completed. Initiated by signal 0. Caused

Bug #14828	ALL Nodes Crash. Forced node shutdown completed. Initiated by signal 0. Caused
Submitted:	10 Nov 2005 14:27	Modified:	15 Dec 2005 11:39
Reporter:	Eric duda	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	5.0.15	OS:	Linux (fedora core 4)
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:

seems to happen shortly after starting up 3 api nodes. 

setup:

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=2    @192.100.100.233  (Version: 5.0.15, Nodegroup: 0, Master)
id=3    @192.100.100.234  (Version: 5.0.15, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.100.100.221  (Version: 5.0.15)

[mysqld(API)]   3 node(s)
id=4 (not connected, accepting connect from 192.100.100.233)
id=5 (not connected, accepting connect from 192.100.100.234)
id=6 (not connected, accepting connect from 192.100.100.221)

ndb_mgm> Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.
Node 3: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

ndb_mgm> all status
Node 2: not connected
Node 3: not connected

Error log:

Current byte-offset of file-pointer is: 568                       

Time: Thursday 10 November 2005 - 08:18:04
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: SimulatedBlock.cpp
Error object: DBTC (Line: 1893) 0x0000000a
Program: ndbd
Pid: 30112
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.1
Version: Version 5.0.15
***EOM***

config.ini:

[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=1824M
IndexMemory=400M
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Management Server
[NDB_MGMD]
HostName=192.100.100.221           # IP address of this server
# Storage Nodes
[NDBD]
HostName=192.100.100.233           # IP address of storage-node-1
DataDir=/var/lib/mysql-cluster
BackupDataDir=/mysqlbackup
NoOfFragmentLogFiles=32
[NDBD]
HostName=192.100.100.234           # IP address of storage-node-2
DataDir=/var/lib/mysql-cluster
BackupDataDir=/mysqlbackup
NoOfFragmentLogFiles=32
# Setup node IDs for mySQL API-servers (clients of the cluster)
[MYSQLD]
HostName=192.100.100.233
[MYSQLD]
HostName=192.100.100.234
[MYSQLD]
HostName=192.100.100.221

rpms installed from mysql web site:

MySQL-server-5.0.15-0.glibc23
MySQL-ndb-storage-5.0.15-0.glibc23
MySQL-client-5.0.15-0.glibc23
MySQL-ndb-management-5.0.15-0.glibc23
MySQL-Max-5.0.15-0.glibc23

How to repeat:
it keeps doing this, everytime i start up the nodes and mysql servers. i can't get the database back online at all anymore.

trace file from crash

Attachment: trace.zip (application/x-zip-compressed, text), 49.23 KiB.

/etc/my.cnf
 
[mysqld]
default-table-type=NDBCLUSTER
ndbcluster
ndb-connectstring='host=192.100.100.221'    # IP address of the management server
[mysql_cluster]
ndb-connectstring='host=192.100.100.221'    # IP address of the management server

Now i'm also getting this error:

Time: Thursday 10 November 2005 - 10:52:57
Status: Temporary error, restart node
Message: Internal program error (failed ndbrequire) (Internal error, programming error or missing error message, please report a bug)
Error: 2341
Error data: DblqhMain.cpp

i didn't have any api nodes running at the time of this crash.
Error object: DBLQH (Line: 16138) 0x0000000a
Program: ndbd
Pid: 3701
Trace: /var/lib/mysql-cluster/ndb_2_trace.log.6
Version: Version 5.0.15
***EOM***

did you do any configuration change?  If so tell us what.
did you upgrade and keep the filesystem?  If so tell us how.

please supply all logs

No configuation changes. I installed the new 5.0 rpms and used mysql client to load all my tables and data that i have saved via mysqldump. After upgrade complete, within mins, ndbd start crashing.

No OS filesystem changes.

Also, since this was many days go, i have since deleted everything and started over, this time not using clustering anymore. I am using replication and it's working fine.

can you provide us with a dump that reproduces this?

Sorry, can't provide dump as it's customer data, but this table might of been causing it.

CREATE TABLE `email` (
  `order_num` varchar(12) NOT NULL default '',
  `email` varchar(50) NOT NULL default '',
  `mail_message` text,
  `order_date` datetime default NULL,
  PRIMARY KEY  (`order_num`),
  KEY `NewIndex` (`email`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1;

it had about 200000 rows in it.

This is the same as #15682
Which will be fixed shortly

Pushed into 5.0.18

yes,
also pushed into 5.0.17
pushed into 5.1.4 (i think, someone else merged....)

Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html

Additional info:

Documented in 5.0.18 and 5.1.4 changelogs. Closed.

I see the same problem in 5.1.11 when I try to restart the master from ndb_mgm.

-- NDB Cluster -- Management Client --
ndb_mgm> 2 restart
Connected to Management Server at: ds-lvs02:1186
Node 2: Node shutdown initiated
Node 2: Node shutdown completed, restarting, no start.
Node 2 is being restarted

ndb_mgm> Node 2: Start initiated (version 5.1.11)
Node 2: Forced node shutdown completed. Occured during startphase 5. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error

After this, the state of node 2 is 'not connected'.
OS: Debian Etch (2.6.17-2-686)