MySQL Bugs: #25280: temporary bug in MySQL 5.1.11 cluster

Bug #25280	temporary bug in MySQL 5.1.11 cluster
Submitted:	26 Dec 2006 11:42	Modified:	24 Apr 2007 3:09
Reporter:	Mitul Savani	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S3 (Non-critical)
Version:	5.1.14, 5.1.11-0	OS:	Linux (Linux AS release 4 (Nahant))
Assigned to:		CPU Architecture:	Any
Tags:	NDB Cluster

Description:
Hi,

I have cluster with below mentioned configuration:

1) API node 512MB RAM and P4 cpu (IP: 172.18.1.138)
2) MGM node 512MB RAM and P4 cpu (IP: 172.18.1.139)
3) Data node 2GB RAM and P4 cpu (IP: 172.18.1.140)
4) Data node  2GB RAM and P4 cpu (IP: 172.18.1.141)

Configuration file of each server is as mentioned below:

===============================
[MGM Node]# cat config.ini
================================
[NDBD DEFAULT]
NoOfReplicas= 2
RedoBuffer=64M
TimeBetweenLocalCheckpoints=6
NoOfFragmentLogFiles=32

[MYSQLD DEFAULT]

# Management Server
[NDB_MGMD]
Id=1
HostName= 172.18.1.139

[NDBD]
Id=3
HostName=172.18.1.140
DataDir= /var/lib/mysql-cluster
#DataMemory = 1400M
#IndexMemory = 200M
MaxNoOfConcurrentTransactions = 500
MaxNoOfConcurrentOperations = 250000
MaxNoOfOrderedIndexes = 57000
MaxNoOfTables = 9000
MaxNoOfAttributes = 25000

[NDBD]
Id=4
HostName=172.18.1.141
DataDir= /var/lib/mysql-cluster
#DataMemory = 1400M
#IndexMemory = 200M
MaxNoOfConcurrentTransactions = 500
MaxNoOfConcurrentOperations = 250000
MaxNoOfOrderedIndexes = 57000
MaxNoOfTables = 9000
MaxNoOfAttributes = 25000

# TCP/IP Connections
[TCP]
NodeId1=3
NodeId2=4
HostName1=172.18.1.140
HostName2=172.18.1.141

# SQL Node
[MYSQLD]
Id=2
HostName=172.18.1.138

==========================
Data node my.cnf file:
=========================
[mysqld]
ndbcluster
# IP address of the cluster management node
ndb-connectstring=172.18.1.139
[mysql_cluster]
# IP address of the cluster management node
ndb-connectstring=172.18.1.139

=======================
SQL API node config
=======================
[MYSQLD]                        
ndbcluster                      # run NDB engine
ndb-connectstring=172.18.1.139  # location of MGM node

# Options for ndbd process:
[MYSQL_CLUSTER]                 
ndb-connectstring=172.18.1.139  # location of MGM node

===========================

Now, when I start the cluster (mangment node) I am receiving below mentioned error:

[root@rhel4mysql2 mysql-cluster]# ndb_mgmd -d
[root@rhel4mysql2 mysql-cluster]# ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=3    @172.18.1.140  (Version: 5.1.11, starting, Nodegroup: 0)
id=4 (not connected, accepting connect from 172.18.1.141)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @172.18.1.139  (Version: 5.1.11)

[mysqld(API)]   1 node(s)
id=2 (not connected, accepting connect from 172.18.1.138)

ndb_mgm> Node 4: Forced node shutdown completed. Occured during startphase 1. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error
Node 3: Forced node shutdown completed. Occured during startphase 1. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'.

Can anyone suggest me correct parameter or is this bug?

How to repeat:
Configure cluster as per above configuration

Suggested fix:
Need help

Thank you for a problem report. Please, try to use newer version, 5.1.14, and inform about the results. In case of similar problem, please, send error logs from failed nodes.

Ok fine, I will test it with latest version.

Do you think any configuration problem with above configuration?

Thanks,

With that lines in place:

#DataMemory = 1400M
#IndexMemory = 200M

you surely had out-of-memory problem. 

I do not know how much free memory you have on your machines before starting cluster... So, please, do as I asked you in previous comment, and inform about the results.

Hi,

I have upgraded the cluster veriosn, and receiving the same type of error:

=====================
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=3 (not connected, accepting connect from 172.18.1.140)
id=4    @172.18.1.141  (Version: 5.0.27, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @172.18.1.139  (Version: 5.1.14)

[mysqld(API)]   1 node(s)
id=2 (not connected, accepting connect from 172.18.1.138)

ndb_mgm> Node 4: Forced node shutdown completed. Occured during startphase 1. Initiated by signal 0. Caused by error 2341: 'Internal program error (failed ndbrequire)(Internal error, programming error or missing error message, please report a bug). Temporary error, restart node'. - Unknown error code: Unknown result: Unknown error code
Node 3: Forced node shutdown completed. Occured during startphase 1. Initiated by signal 0. Caused by error 2308: 'Another node failed during system restart, please investigate error(s) on other node(s)(Restart error). Temporary error, restart node'. - Unknown error code: Unknown result: Unknown error code
=====================

Any help would be apreciated!!

Thanks,

Mitul Savani

Hi,

for your convenient, please find the error log from failed node:

Message: Another node failed during system restart, please investigate error(s) on other node(s) (Restart error)
Error: 2308
Error data: Node 4 disconnected
Error object: QMGR (Line: 2554) 0x0000000e
Program: ndbd
Pid: 2788
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.1
Version: Version 5.1.14 (beta)
***EOM***
                                                                                                       
[root@rhel4mysql3 mysql-cluster]# cat ndb_3_error.log 
Current byte-offset of file-pointer is: 568                       

Time: Wednesday 27 December 2006 - 13:41:12
Status: Temporary error, restart node
Message: Another node failed during system restart, please investigate error(s) on other node(s) (Restart error)
Error: 2308
Error data: Node 4 disconnected
Error object: QMGR (Line: 2554) 0x0000000e
Program: ndbd
Pid: 2788
Trace: /var/lib/mysql-cluster/ndb_3_trace.log.1
Version: Version 5.1.14 (beta)
***EOM***
                                                    

Thanks,

Mitul Savani

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Please, try to repeat with a newer version, 5.1.16, and inform about hte results.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".