MySQL Bugs: #17417: Mysql Cluster Node is unable to start after graceful stop

Bug #17417	Mysql Cluster Node is unable to start after graceful stop
Submitted:	15 Feb 2006 9:09	Modified:	28 Apr 2006 8:46
Reporter:	F Huber	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	5.1.8	OS:	Linux (Red Hat Enterprise Linux 4 - 64Bit)
Assigned to:	Jonas Oreland	CPU Architecture:	Any

Description:

I have one management node, one sql node and 2 storage nodes

with the following config:

[NDBD DEFAULT]
DataDir=/var/lib/mysql-cluster/
NoOfReplicas=2
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]

# Managment Server
[NDB_MGMD]
HostName=192.168.0.155          # the IP of THIS SERVER

# Storage Engines
[NDBD]
HostName=192.168.0.152          # the IP of the FIRST SERVER
[NDBD]
HostName=192.168.0.153          # the IP of the SECOND SERVER
# 2 MySQL Clients
[MYSQLD]
HostName=192.168.0.154
[MYSQLD]
HostName=192.168.0.155

I am running 5.1.6 alpha

after a short while I get this message on the management node:

 Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

this affected nodes says

a
bleId: RNIL
 localKeyLength: 1 maxLoadFactor: 80 minLoadFactor: 78
 kValue: 6 lh3DistrBits: 0 lh3PageBits: 0
 noOfAttributes: 2 noOfNullAttributes: 1 keyLength: 2
 noOfPagesToPreAllocate: 0 schemaVersion: 1 nextLCP: 0
 senderData: 5 senderRef: fa0003 tableId: 4 fragmentId: 1 tableType: 2 primaryTa
bleId: RNIL
 localKeyLength: 1 maxLoadFactor: 80 minLoadFactor: 78
 kValue: 6 lh3DistrBits: 0 lh3PageBits: 0
 noOfAttributes: 2 noOfNullAttributes: 1 keyLength: 2
 noOfPagesToPreAllocate: 0 schemaVersion: 1 nextLCP: 0
2006-02-14 11:26:30 [ndbd] INFO     -- Received signal 15. Performing stop.
2006-02-14 11:26:31 [ndbd] INFO     -- Error handler shutting down system
2006-02-14 11:26:32 [ndbd] INFO     -- Error handler shutdown completed - exitin
g
2006-02-14 11:26:32 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. I
nitiated by signal 15.
2006-02-14 11:26:32 [ndbd] WARNING  -- Unable to report shutdown reason to 192.1
68.0.155:1186: Could not connect to socket : Unable to connect with connect stri
ng: nodeid=0,192.168.0.155:1186

If I restart the node it will show up again after a short while

[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.152  (Version: 5.1.6, Nodegroup: 0)
id=3    @192.168.0.153  (Version: 5.1.6, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.155  (Version: 5.1.6)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.154  (Version: 5.1.6)
id=5    @192.168.0.155  (Version: 5.1.6)

How to repeat:
I am doing a convert of Myisam to NBDcluster with

for t in $(mysql -u root --batch --column-names=false -e "show tables" bla);
 do echo "now table ".$t; mysql -u root -e "alter table $t type=NDBCLUSTER" bla;
done

Some of the tables cause errors when I try to change the type since the contain fulltext fields etc

please provide the error log and trace from node 3

you could try ndb_error_reporter to collect all files.

BR,

Tomas

I downloaded the BitKeeper version (5.1.8) and compiled the NDBD with the following Parameter (I am running the Cluster on a XEON 64 Bit Redhat AS - 2.6.9-27.ELsmp #1 SMP x86_64 x86_64 x86_64 GNU/Linux):

extra_flags="-mtune=nocona -O3 -fPIC -fno-omit-frame-pointer -felide-constructors -fno-exceptions -fno-rtti -g"

extra_configs="--with-innodb --with-ndbcluster --with-archive-storage-engine --with-big-tables --with-federated-storage-engine --with-csv-storage-engine --with-embedded-server --enable-assembler --with-mysqld-ldflags=-all-static"

Error report is attached (ndb_error_report_20060221143455.tar.bz2)

I will try it with a plain compile-amd64-max and post the results too

Error Report of a 5.0.8 ndbd - which I tried to start on node 11 (part1)

Attachment: ndb_error_report_20060221143455_part1.zip (application/zip, text), 154.75 KiB.

Would it be possible to get your schema/data ?

Hi,

I downloaded your dump, loaded it and did a couple of restarts.
Wo/ problem... (on 5.1.8)

Could you try uploading another dump?
/Jonas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Do you still get this.
Are you still up for remote login?

Fixed in 5.1.7, but bug#19333 masked "original bug"

Document as:
ndbd restart could fail due to incorrect memory access

Felix: please (re) open if you encounter bugs again.

Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

Additional info:

Documented bugfix in 5.1.7 changelog. Closed.