MySQL Bugs: #25918: ndb_restore fails when restoring a backup of a disk-data cluster

Bug #25918	ndb_restore fails when restoring a backup of a disk-data cluster
Submitted:	29 Jan 2007 15:48	Modified:	13 Mar 2009 8:50
Reporter:	Rob Kinyon	Email Updates:
Status:	Patch approved	Impact on me:	None
Category:	MySQL Cluster: Disk Data	Severity:	S1 (Critical)
Version:	mysql-5.1	OS:	Linux (Ubuntu 6.06 ETS)
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	5.1.14, 5.1.22

Description:
When running ndb_restore as root, the undo*.dat files cannot be overwritten, thus ndb_restore fails partially through.

How to repeat:
I took a backup of a disk-data cluster, then destroyed the data in the cluster by taking the cluster down and bringing it back up with "ndbd --initial". I then did the following command (with output).

root@bed1:/usr/local/mysql/bin# ./ndb_restore -m -b 1 --backup_path /var/lib/mysql-cluster/BACKUP/BACKUP-1/ -n 2
Backup Id = 1
Nodeid = 2
backup path = /var/lib/mysql-cluster/BACKUP/BACKUP-1/
Ndb version in backup files: Version 5.1.14
Connected to ndb!!
Creating logfile group: lg_2...done
Creating tablespace: ts_2...done
Creating undofile "undo_10.dat"...FAILED
Create undofile failed: undo_10.dat: 1509: File system error, check if path,permissions etc
Restore: Failed to restore table: sys/def/8/username$unique ... Exiting

When I manually deleted the undo files and re-ran ndb_restore, it complained about the tablespace already being there. I had to remove the ndb_2_fs directory and restart the node.

Rob,

the ndb_restore permissions have no impact on this.  It is the ndbd's that create the files.

can you provide us with information about:
1. user running the "ndbd", and what permissions it has

I'm assuming you are getting this problem with any disk based table.  If not:
2. please provide an appropriate schema to reproduce.

furthermore, failing ndb_restore _will_ leave cluster in "half applied" state.

Is this the bug that you want to file?

or that there are some unwanted permission requirements for the ndb's to work?

I.e. unsure what actual bug you are filing.

So please provide:
3. observed behavior
4. what you see as should be expected behavior

BR,

Tomas

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I got hit by this too, here's a reproducable test case.

$ cat config.ini 
===============================================
[NDB_MGMD]
NodeId = 1
HostName = 127.0.0.1
DataDir  = /usr/local/ndb
LogDestination = FILE:filename=my-cluster.log

[NDBD DEFAULT]
HostName = 127.0.0.1
DataDir  = /usr/local/ndb
NoOfReplicas = 2
DataMemory = 20M
IndexMemory = 10M

[NDBD]
NodeId = 5

[NDBD]
NodeId =6

[MYSQLD]
NodeId = 9
[MYSQLD]
NodeId = 10
[MYSQLD]
NodeId = 11
[MYSQLD]
===============================================

$ ndb_mgmd -f config.ini 
$ ndbd --initial
$ ndbd --initial

$ ndb_mgm -e show
Connected to Management Server at: 127.0.0.1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0, Master)
id=6    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   4 node(s)
id=9    @127.0.0.1  (Version: 5.1.22)
id=10   @127.0.0.1  (Version: 5.1.22)
id=11   @127.0.0.1  (Version: 5.1.22)
id=12 (not connected, accepting connect from any host)

mysql> CREATE DATABASE IF NOT EXISTS Test;
Query OK, 1 row affected (0.00 sec)

mysql> USE Test;
Database changed

mysql> CREATE LOGFILE GROUP lg ADD UNDOFILE 'undo' ENGINE ndb;
Query OK, 0 rows affected (10.26 sec)

mysql> CREATE TABLESPACE ts ADD DATAFILE 'data' USE LOGFILE GROUP lg ENGINE ndb;
Query OK, 0 rows affected (9.06 sec)

mysql> CREATE TABLE t1 ( id INT PRIMARY KEY, a INT, b CHAR(5) ) TABLESPACE ts STORAGE DISK ENGINE=NDB;
Query OK, 0 rows affected (1.28 sec)

mysql> INSERT INTO t1 VALUES (1, 1, 'abc');
Query OK, 1 row affected (0.08 sec)

$ ndb_mgm -e "start backup"
Connected to Management Server at: 127.0.0.1:1186
Waiting for completed, this may take several minutes
Node 5: Backup 1 started from node 1
Node 5: Backup 1 started from node 1 completed
 StartGCP: 192 StopGCP: 195
 #Records: 2056 #LogRecords: 0
 Data: 34612 bytes Log: 0 bytes

$ ndb_mgm -e shutdown
Connected to Management Server at: 127.0.0.1:1186
2 NDB Cluster node(s) have shutdown.
Disconnecting to allow management server to shutdown.

# Modify the config.ini to add two more ndbd groups:

+[NDBD]
+NodeId =7
+[NDBD]
+NodeId =8

$ ndb_mgmd -f config.ini 
$ ndbd --initial
$ ndbd --initial
$ ndbd --initial
$ ndbd --initial

$ ndb_mgm -e show
Connected to Management Server at: 127.0.0.1:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     4 node(s)
id=5    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0, Master)
id=6    @127.0.0.1  (Version: 5.1.22, Nodegroup: 0)
id=7    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)
id=8    @127.0.0.1  (Version: 5.1.22, Nodegroup: 1)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (Version: 5.1.22)

[mysqld(API)]   4 node(s)
id=9    @127.0.0.1  (Version: 5.1.22)
id=10   @127.0.0.1  (Version: 5.1.22)
id=11   @127.0.0.1  (Version: 5.1.22)
id=12 (not connected, accepting connect from any host)

..BACKUP/BACKUP-1$ ndb_restore -b 1 -r -n 5 -m
Backup Id = 1
Nodeid = 5
backup path = ./
Ndb version in backup files: Version 5.1.22
Connected to ndb!!
Creating logfile group: lg...done
Creating tablespace: ts...done
Creating undofile "undo"...FAILED
Create undofile failed: undo: 1509: File system error, check if path,permissions etc
Restore: Failed to restore table: sys/def/9/PRIMARY ... Exiting 

NDBT_ProgramExit: 1 - Failed

My tests were on Mac OS 10.5 with 5.1.22

Thank you for the report.

Verified as Tobias Asplund described.

The reason is the disk data didn't be removed before restarting the cluster.
The restore can work if remove the undo files and data files in ndb_N_fs directory.

Need we cover the old disk data any way when do restore? 
Or we give warning about that?
Or document that(remove old disk data before restore) in manual?

how about adding a flag "--overwrite-files"
(with unknown default)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45491

ChangeSet@1.2574, 2008-04-16 19:57:46+00:00, lzhou@dev3-63.(none) +5 -0
  BUG#25918 add new flag '-o' to over write disk files

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/45737

ChangeSet@1.2574, 2008-04-21 10:23:05+00:00, lzhou@dev3-63.(none) +5 -0
  BUG#25918 add new flag '-o' to over write disk files

testcases probably needs cleanup now...

6.2 is reasonable target