Bug #17417 Mysql Cluster Node is unable to start after graceful stop
Submitted: 15 Feb 2006 9:09 Modified: 28 Apr 2006 8:46
Reporter: F Huber Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S1 (Critical)
Version:5.1.8 OS:Linux (Red Hat Enterprise Linux 4 - 64Bit)
Assigned to: Jonas Oreland CPU Architecture:Any

[15 Feb 2006 9:09] F Huber
Description:

I have one management node, one sql node and 2 storage nodes

with the following config:

[NDBD DEFAULT]
DataDir=/var/lib/mysql-cluster/
NoOfReplicas=2
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]

# Managment Server
[NDB_MGMD]
HostName=192.168.0.155          # the IP of THIS SERVER

# Storage Engines
[NDBD]
HostName=192.168.0.152          # the IP of the FIRST SERVER
[NDBD]
HostName=192.168.0.153          # the IP of the SECOND SERVER
# 2 MySQL Clients
[MYSQLD]
HostName=192.168.0.154
[MYSQLD]
HostName=192.168.0.155

I am running 5.1.6 alpha

after a short while I get this message on the management node:

 Node 2: Forced node shutdown completed. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'.

this affected nodes says

a
bleId: RNIL
 localKeyLength: 1 maxLoadFactor: 80 minLoadFactor: 78
 kValue: 6 lh3DistrBits: 0 lh3PageBits: 0
 noOfAttributes: 2 noOfNullAttributes: 1 keyLength: 2
 noOfPagesToPreAllocate: 0 schemaVersion: 1 nextLCP: 0
 senderData: 5 senderRef: fa0003 tableId: 4 fragmentId: 1 tableType: 2 primaryTa
bleId: RNIL
 localKeyLength: 1 maxLoadFactor: 80 minLoadFactor: 78
 kValue: 6 lh3DistrBits: 0 lh3PageBits: 0
 noOfAttributes: 2 noOfNullAttributes: 1 keyLength: 2
 noOfPagesToPreAllocate: 0 schemaVersion: 1 nextLCP: 0
2006-02-14 11:26:30 [ndbd] INFO     -- Received signal 15. Performing stop.
2006-02-14 11:26:31 [ndbd] INFO     -- Error handler shutting down system
2006-02-14 11:26:32 [ndbd] INFO     -- Error handler shutdown completed - exitin
g
2006-02-14 11:26:32 [ndbd] ALERT    -- Node 3: Forced node shutdown completed. I
nitiated by signal 15.
2006-02-14 11:26:32 [ndbd] WARNING  -- Unable to report shutdown reason to 192.1
68.0.155:1186: Could not connect to socket : Unable to connect with connect stri
ng: nodeid=0,192.168.0.155:1186

If I restart the node it will show up again after a short while

[ndbd(NDB)]     2 node(s)
id=2    @192.168.0.152  (Version: 5.1.6, Nodegroup: 0)
id=3    @192.168.0.153  (Version: 5.1.6, Nodegroup: 0, Master)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @192.168.0.155  (Version: 5.1.6)

[mysqld(API)]   2 node(s)
id=4    @192.168.0.154  (Version: 5.1.6)
id=5    @192.168.0.155  (Version: 5.1.6)

How to repeat:
I am doing a convert of Myisam to NBDcluster with

for t in $(mysql -u root --batch --column-names=false -e "show tables" bla);
 do echo "now table ".$t; mysql -u root -e "alter table $t type=NDBCLUSTER" bla;
done

Some of the tables cause errors when I try to change the type since the contain fulltext fields etc
[15 Feb 2006 13:00] Tomas Ulin
please provide the error log and trace from node 3

you could try ndb_error_reporter to collect all files.

BR,

Tomas
[21 Feb 2006 13:42] F Huber
I downloaded the BitKeeper version (5.1.8) and compiled the NDBD with the following Parameter (I am running the Cluster on a XEON 64 Bit Redhat AS - 2.6.9-27.ELsmp #1 SMP x86_64 x86_64 x86_64 GNU/Linux):

extra_flags="-mtune=nocona -O3 -fPIC -fno-omit-frame-pointer -felide-constructors -fno-exceptions -fno-rtti -g"

extra_configs="--with-innodb --with-ndbcluster --with-archive-storage-engine --with-big-tables --with-federated-storage-engine --with-csv-storage-engine --with-embedded-server --enable-assembler --with-mysqld-ldflags=-all-static"

Error report is attached (ndb_error_report_20060221143455.tar.bz2)

I will try it with a plain compile-amd64-max and post the results too
[21 Feb 2006 13:47] F Huber
Error Report of a 5.0.8 ndbd - which I tried to start on node 11 (part1)

Attachment: ndb_error_report_20060221143455_part1.zip (application/zip, text), 154.75 KiB.

[21 Feb 2006 14:08] Jonas Oreland
Would it be possible to get your schema/data ?
[14 Mar 2006 13:46] Jonas Oreland
Hi,

I downloaded your dump, loaded it and did a couple of restarts.
Wo/ problem... (on 5.1.8)

Could you try uploading another dump?
/Jonas
[14 Apr 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[15 Apr 2006 5:28] Jonas Oreland
Do you still get this.
Are you still up for remote login?
[26 Apr 2006 15:12] Jonas Oreland
Fixed in 5.1.7, but bug#19333 masked "original bug"

Document as:
ndbd restart could fail due to incorrect memory access

Felix: please (re) open if you encounter bugs again.
[28 Apr 2006 8:46] Jon Stephens
Thank you for your bug report. This issue has already been fixed
in the latest released version of that product, which you can download at 
http://www.mysql.com/downloads/

Additional info:

Documented bugfix in 5.1.7 changelog. Closed.