MySQL Bugs: #27635: Creating Logfile group fails

Bug #27635	Creating Logfile group fails
Submitted:	4 Apr 2007 6:16	Modified:	27 Nov 2008 11:35
Reporter:	Andrew Bishop	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S1 (Critical)
Version:	5.1	OS:	Linux (Red Hat 9)
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
The error occurs when I execute the following command

CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_1.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE 2M ENGINE NDB;

I get the following error

ERROR 1516 (HY000): Failed to create UNDOFILE

Though I check the cluster dir and the an empty file has been created.  This is not the worst of it, this also cause my data node to shut down as well.  Which is a forced shutdown with error 2301, then tries to restart node without success.

Message from error log

Assertion (Internal error, programming error or missing error message, please report a bug)

Also I am running all three nodes on the same machine, as I am testing this to see if it will be useful in our production environment.

I have installed the following on a Red Hat 9 box, please note that I removed the previous installation of mysql with clustering (5.0.37).  

- MySQL-client-5.1.16-0.glibc23.i386.rpm
- MySQL-ndb-extra-5.1.16-0.glibc23.i386.rpm
- MySQL-ndb-management-5.1.16-0.glibc23.i386.rpm
- MySQL-ndb-storage-5.1.16-0.glibc23.i386.rpm
- MySQL-ndb-tools-5.1.16-0.glibc23.i386.rpm
- MySQL-server-5.1.16-0.glibc23.i386.rpm

After install and setup I use the dump from my previous mysql installation which was also stored on a cluster.  This works ok, just thought I would mention it.

[NDBD DEFAULT]
NoOfReplicas=1
arbitrationtimeout=1000
stoponerror=0
startpartialtimeout=1000
startpartitionedtimeout=1000
maxnooforderedindexes=512
maxnoofattributes=40000
datamemory=300048576
DiskPageBufferMemory=10000000
SharedGlobalMemory=900048576
[COMPUTER]
id=1001
hostname=192.168.0.216
[COMPUTER]
id=1002
hostname=192.168.0.136
[MYSQLD DEFAULT]
arbitrationrank=0
[NDB_MGMD DEFAULT]
arbitrationrank=1
datadir=/cluster/ndbmgm
[TCP DEFAULT]
[NDB_MGMD]
id=1
executeoncomputer=1001
[NDBD]
id=3
executeoncomputer=1001
DataDir=/cluster/1
[MYSQLD]
id=5
executeoncomputer=1001
[MYSQLD]
[MYSQLD]
[MYSQLD]

How to repeat:
Setup as above

CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_1.dat' INITIAL_SIZE 16M UNDO_BUFFER_SIZE 2M ENGINE NDB;

can you do a SHOW ERRORS when you get this (there may be additional error messages in there) and provide us with the cluster logs, or at least the error and trace logs of the failing node, too?

CREATE LOGFILE GROUP lg1 ADD UNDOFILE 'undo1.dat' INITIAL_SIZE 100M UNDO_BUFFER_SIZE = 10M ENGINE=NDB;

Show errors

+-------+------+-------------------------------------------+
| Level | Code | Message                                   |
+-------+------+-------------------------------------------+
| Error | 1296 | Got error 4009 'Cluster Failure' from NDB |
| Error | 1516 | Failed to create UNDOFILE                 |
+-------+------+-------------------------------------------+
2 rows in set (0.00 sec)

ndb_1_out.log

NDB Cluster Management Server. Version 5.1.16 (beta)
Id: 1, Command port: 1186
setEventReportingLevelImpl: failed 2!

node error log

Attachment: ndb_3_error.log (application/octet-stream, text), 1.53 KiB.

node out log

Attachment: ndb_3_out.log (application/octet-stream, text), 5.06 KiB.

managment node out log

Attachment: ndb_1_out.log (application/octet-stream, text), 117 bytes.

management node cluster log

Attachment: ndb_1_cluster.log (application/octet-stream, text), 62.38 KiB.

Andrew,

we see only the restart failures in the logs you supplied.

this bug that you are hitting ar system restart is most likely the same as 17614.

You've managed to get into a state when you have a log file groups and no undo file...

What we'd like to try to understand is how that could happen.  Do you have the logs from the time when the the first error occurred, when first creating the adding the undo file...

BR,

Tomas

Sorry I dont have the log file from when this happend, but the log files I added where from a clean install of mysql.

Unfortunately there is not enough information in this report to reproduce the original issue.
The bug is marked against telco-6.3, but appears to be a 5.1 issue.
As mentioned previously, there is a known issue during system restart when an undo file is missing (can be reproduced by deleting an undofile, then attempting SR).  See bug 17614.

What is missing from this bug report is the trace file (mentioned in the error log) corresponding to the initial failure, when the undo file could not be created, and the node crashed for the first time.

If possible, could you send the trace file mentioned, or reproduce and send it?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".