Bug #75645 Runtime Error Caused by Misconfigured BackupDataDir
Submitted: 27 Jan 2015 9:26 Modified: 30 Jan 2015 12:47
Reporter: Tianyin Xu Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:ndb-7.3.7, 7.3.8 OS:Linux
Assigned to: CPU Architecture:Any

[27 Jan 2015 9:26] Tianyin Xu
Description:
NDB does not actively check the misconfigured "BackupDataDir"; as a result, an erroneous path would cause runtime error. 

How to repeat:
$ ndbd --version
MySQL distrib mysql-5.6.21 ndb-7.3.7, for linux-glibc2.5 (i686)

Set an erroneous file path of BackupDataDir in conf/config.ini, such as

[ndbd default]
BackupDataDir=i_am_not_valid

Then, start ndb_mgm and ndbd (the erroneous path is not checked and the MySQL cluster can be started successfully). After that, issue the backup command,

$ ndb_mgm -e "START BACKUP WAIT STARTED" 

Connected to Management Server at: XXXXXX:1186
Waiting for started, this may take several minutes
Backup failed
*  3001: Could not start backup
*        Unknown error code: Unknown result: Unknown error code

The following error message is found in the error log

......
Status: Permanent error, external action needed
Message: Illegal file system path (Configuration error)
Error: 2805
Error data: i_am_not_valid
Error object: BackupDataDir
Program: ndbd
Pid: 16958
Version: mysql-5.6.21 ndb-7.3.7
Trace: /XXXXXX/ndb_data/ndb_3_trace.log.4 [t1..t1]
***EOM***

Suggested fix:
I don't have a clear idea where to insert a checker, but we should check the validity of the path of BackupDataDir before accepting the value.
[29 Jan 2015 10:07] MySQL Verification Team
Hello Tianyin Xu,

Thank you for the report.
Imho - mgmd node will start but ndbd nodes will fail to start if the BackupDataDir is invalid. I agree that it should give more useful error message than just displaying as INFO when there is such issues i.e invalid path to BackupDataDir.

Could you please confirm how did you start ndbd nodes in your tests? Was cluster up and running, and you later deleted BackupDataDir or updated BackupDataDir when cluster was already up? Please let us know. 

Thanks,
Umesh
[29 Jan 2015 10:08] MySQL Verification Team
// 7.3.8

[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: cat config.ini |grep BackupDataDir
#BackupDataDir=/export/umesh/mysql-cluster-7.3.8/backup
BackupDataDir=i_am_invalid
#BackupDataDir=/export/umesh/mysql-cluster-7.3.8/backup
BackupDataDir=i_am_invalid
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: bin/ndb_mgmd --configdir=/export/umesh/mysql-cluster-7.3.8/ -f config.ini --initial
MySQL Cluster Management Server mysql-5.6.22 ndb-7.3.8
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8:
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: bin/ndb_mgm -e "SHOW"
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11 (not connected, accepting connect from localhost)
id=12 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.22 ndb-7.3.8)

[mysqld(API)]   9 node(s)
id=21 (not connected, accepting connect from localhost)
id=31 (not connected, accepting connect from any host)
id=32 (not connected, accepting connect from any host)
id=33 (not connected, accepting connect from any host)
id=34 (not connected, accepting connect from any host)
id=35 (not connected, accepting connect from any host)
id=36 (not connected, accepting connect from any host)
id=37 (not connected, accepting connect from any host)
id=38 (not connected, accepting connect from any host)

// Confirmed that with invalid BackupDataDir, ndbd process don't start

[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: bin/ndbd --initial --foreground
2015-01-28 13:01:15 [ndbd] INFO     -- Ndb started in foreground
2015-01-28 13:01:15 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-28 13:01:15 [ndbd] INFO     -- i_am_invalid
2015-01-28 13:01:15 [ndbd] INFO     -- BackupDataDir
2015-01-28 13:01:15 [ndbd] INFO     -- Error handler shutting down system
2015-01-28 13:01:15 [ndbd] INFO     -- Error handler shutdown completed - exiting
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8:
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: bin/ndbd --initial --foreground
2015-01-28 13:01:26 [ndbd] INFO     -- Ndb started in foreground
2015-01-28 13:01:26 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-28 13:01:26 [ndbd] INFO     -- i_am_invalid
2015-01-28 13:01:26 [ndbd] INFO     -- BackupDataDir
2015-01-28 13:01:26 [ndbd] INFO     -- Error handler shutting down system
2015-01-28 13:01:26 [ndbd] INFO     -- Error handler shutdown completed - exiting
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8:

// Confirmed NDBD node not started at all

[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.8: bin/ndb_mgm -e "SHOW"
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11 (not connected, accepting connect from localhost)
id=12 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.22 ndb-7.3.8)

[mysqld(API)]   9 node(s)
id=21 (not connected, accepting connect from localhost)
id=31 (not connected, accepting connect from any host)
id=32 (not connected, accepting connect from any host)
id=33 (not connected, accepting connect from any host)
id=34 (not connected, accepting connect from any host)
id=35 (not connected, accepting connect from any host)
id=36 (not connected, accepting connect from any host)
id=37 (not connected, accepting connect from any host)
id=38 (not connected, accepting connect from any host)

// ndbd node 12 error log

Current byte-offset of file-pointer is: 568

Time: Wednesday 28 January 2015 - 13:01:26
Status: Permanent error, external action needed
Message: Illegal file system path (Configuration error)
Error: 2805
Error data: i_am_invalid
Error object: BackupDataDir
Program: ndbd
Pid: 1057
Version: mysql-5.6.22 ndb-7.3.8
Trace: /export/umesh/mysql-cluster-7.3.8/log/ndbd/ndb_12_trace.log.1 [t1..t1]
***EOM***
[29 Jan 2015 10:11] MySQL Verification Team
// 7.3.7

[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7: bin/ndb_mgmd --configdir=/export/umesh/mysql-cluster-7.3.7/ -f config.ini --initial
MySQL Cluster Management Server mysql-5.6.21 ndb-7.3.7
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7: bin/ndb_mgm -e "SHOW"
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11 (not connected, accepting connect from localhost)
id=12 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.21 ndb-7.3.7)

[mysqld(API)]   7 node(s)
id=21 (not connected, accepting connect from localhost)
id=22 (not connected, accepting connect from any host)
id=23 (not connected, accepting connect from any host)
id=24 (not connected, accepting connect from any host)
id=25 (not connected, accepting connect from any host)
id=26 (not connected, accepting connect from any host)
id=27 (not connected, accepting connect from any host)

[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7: bin/ndbd --initial --foreground
2015-01-28 13:55:05 [ndbd] INFO     -- Ndb started in foreground
2015-01-28 13:55:05 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-28 13:55:05 [ndbd] INFO     -- /export/umesh/mysql-cluster-7.3.7/backup
2015-01-28 13:55:05 [ndbd] INFO     -- BackupDataDir
2015-01-28 13:55:05 [ndbd] INFO     -- Error handler shutting down system
2015-01-28 13:55:05 [ndbd] INFO     -- Error handler shutdown completed - exiting
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7:
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7: bin/ndbd --initial --foreground
2015-01-28 13:55:13 [ndbd] INFO     -- Ndb started in foreground
2015-01-28 13:55:13 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-28 13:55:13 [ndbd] INFO     -- /export/umesh/mysql-cluster-7.3.7/backup
2015-01-28 13:55:13 [ndbd] INFO     -- BackupDataDir
2015-01-28 13:55:13 [ndbd] INFO     -- Error handler shutting down system
2015-01-28 13:55:13 [ndbd] INFO     -- Error handler shutdown completed - exiting
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7:
[ushastry@ushastry]/export/umesh/mysql-cluster-7.3.7: bin/ndb_mgm -e "SHOW"
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11 (not connected, accepting connect from localhost)
id=12 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.21 ndb-7.3.7)

[mysqld(API)]   7 node(s)
id=21 (not connected, accepting connect from localhost)
id=22 (not connected, accepting connect from any host)
id=23 (not connected, accepting connect from any host)
id=24 (not connected, accepting connect from any host)
id=25 (not connected, accepting connect from any host)
id=26 (not connected, accepting connect from any host)
id=27 (not connected, accepting connect from any host)

// ndbd node 11 error log

Time: Wednesday 28 January 2015 - 13:55:05
Status: Permanent error, external action needed
Message: Illegal file system path (Configuration error)
Error: 2805
Error data: backup
Error object: BackupDataDir
Program: ndbd
Pid: 4685
Version: mysql-5.6.21 ndb-7.3.7
Trace: /export/umesh/mysql-cluster-7.3.7/log/ndbd/ndb_11_trace.log.2 [t1..t1]
[29 Jan 2015 18:14] Tianyin Xu
Hi Umesh, 

Thanks a lot for reviewing my report. I reproduced the error, and find the root cause. You are right, if BackupDataDir is set to a non-existent path, the ndbd nodes would refuse to start. 

In my case, I mistakenly set BackupDataDir to a file path that already exists on my file system. In this case, the ndbd starts and failed when issuing the backup command at runtime. I reproduced the error.

Here's my way to reproduce it on a single node:

1. Create a regular file:
$ touch /XXX/hello

2. Set the hello to be the value of BackupDataDir in config.ini, as follows:

[ndbd default]
BackupDataDir=/XXX/hello

3. Start the cluster, basically starting both mgmd and ndbd (you even don't need to start your mysqld daemon):

$ ./start_cluster.sh 
$ ndb_mgm -e show

MySQL Cluster Management Server mysql-5.6.21 ndb-7.3.7
2015-01-29 09:41:39 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2015-01-29 09:41:39 [ndbd] INFO     -- Angel allocated nodeid: 3
2015-01-29 09:41:41 [ndbd] INFO     -- Angel connected to 'localhost:1186'
2015-01-29 09:41:41 [ndbd] INFO     -- Angel allocated nodeid: 4
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=3	@127.0.0.1  (mysql-5.6.21 ndb-7.3.7, starting, Nodegroup: 0, *)
id=4	@127.0.0.1  (mysql-5.6.21 ndb-7.3.7, starting, Nodegroup: 0)

[ndb_mgmd(MGM)]	1 node(s)
id=1	@172.19.222.99  (mysql-5.6.21 ndb-7.3.7)

[mysqld(API)]	1 node(s)
id=50 (not connected, accepting connect from any host)

4. Issue the backup requests:

$ ndb_mgm -e "START BACKUP WAIT STARTED" 

Connected to Management Server at: localhost:1186
Waiting for started, this may take several minutes
Backup failed
*  3001: Could not start backup
*        Unknown error code: Unknown result: Unknown error code

5. Go to the error log and you can see something like this:

Time: Thursday 29 January 2015 - 09:50:14
Status: Permanent error, external action needed
Message: Illegal file system path (Configuration error)
Error: 2805
Error data: hello
Error object: BackupDataDir
Program: ndbd
Pid: 24877
Version: mysql-5.6.21 ndb-7.3.7
Trace: XXXXXXXXXX/ndb_data/ndb_3_trace.log.1 [t1..t1]
***EOM***

Thanks!
Tianyin
[30 Jan 2015 12:47] MySQL Verification Team
Thank you for the feedback.

// BackupDataDir is file

[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: ls -l I_AM_INVALID
-rw-r--r-- 1 umshastr common 0 Jan 30 13:16 I_AM_INVALID

[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: cat config.ini |grep BackupDataDir
BackupDataDir=/export/umesh/mysql-cluster-7.3.8/I_AM_INVALID
BackupDataDir=/export/umesh/mysql-cluster-7.3.8/I_AM_INVALID

[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: bin/ndb_mgmd --configdir=/export/umesh/mysql-cluster-7.3.8/ -f config.ini --initial
MySQL Cluster Management Server mysql-5.6.22 ndb-7.3.8
[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: bin/ndb_mgm -e "show"
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]     2 node(s)
id=11 (not connected, accepting connect from localhost)
id=12 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1    @127.0.0.1  (mysql-5.6.22 ndb-7.3.8)

[mysqld(API)]   9 node(s)
id=21 (not connected, accepting connect from localhost)
..

// start ndbd nodes

[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: bin/ndbd --initial --foreground
2015-01-30 13:39:30 [ndbd] INFO     -- Ndb started in foreground
2015-01-30 13:39:30 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-30 13:39:30 [ndbd] WARNING  -- Cannot change directory to '/export/umesh/mysql-cluster-7.3.8/log/ndbd', error: 2
NDBMT: non-mt
2015-01-30 13:39:30 [ndbd] INFO     -- NDB Cluster -- DB node 11
2015-01-30 13:39:30 [ndbd] INFO     -- mysql-5.6.22 ndb-7.3.8 --
2015-01-30 13:39:30 [ndbd] INFO     -- numa_set_interleave_mask(numa_all_nodes) : OK
2015-01-30 13:39:30 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 16615Mb initial: 17639Mb
Adding 223Mb to ZONE_LO (1,7135)
..

2015-01-30 13:39:52 [ndbd] INFO     -- Node started
2015-01-30 13:39:53 [ndbd] INFO     -- Started arbitrator node 1 [ticket=3bcd00012ca5d59b]

[umshastr@hod03]/export/umesh/mysql-cluster-7.3.8: bin/ndbd --initial --foreground
2015-01-30 13:39:33 [ndbd] INFO     -- Ndb started in foreground
2015-01-30 13:39:33 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1
2015-01-30 13:39:33 [ndbd] WARNING  -- Cannot change directory to '/export/umesh/mysql-cluster-7.3.8/log/ndbd', error: 2
NDBMT: non-mt
2015-01-30 13:39:33 [ndbd] INFO     -- NDB Cluster -- DB node 12
2015-01-30 13:39:33 [ndbd] INFO     -- mysql-5.6.22 ndb-7.3.8 --
2015-01-30 13:39:33 [ndbd] INFO     -- numa_set_interleave_mask(numa_all_nodes) : OK
2015-01-30 13:39:33 [ndbd] INFO     -- Ndbd_mem_manager::init(1) min: 16615Mb initial: 17639Mb
Adding 223Mb to ZONE_LO (1,7135)
Instantiating DBSPJ instanceNo=0
Started thread, index = 0, id = 15318, type = SocketClientThread
Started thread, index = 1, id = 15317, type = WatchDogThread
Started thread, index = 2, id = 15319, type = SocketServerThread
2015-01-30 13:39:34 [ndbd] INFO     -- Start initiated (mysql-5.6.22 ndb-7.3.8)
NDBFS/AsyncFile: Allocating 310392 for In/Deflate buffer
Started thread, index = 4, id = 15320, type = NdbfsThread
.
2015-01-30 13:39:52 [ndbd] INFO     -- Node started
2015-01-30 13:39:52 [ndbd] INFO     -- Prepare arbitrator node 1 [ticket=3bcd00012ca5d59b]

^^ both node started without any issue

## Start backup

ndb_mgm> START BACKUP WAIT STARTED
Waiting for started, this may take several minutes
Backup failed
*  3001: Could not start backup
*        Unknown error code: Unknown result: Unknown error code
Node 11: Backup 1 started from 1 has been aborted. Error: 2812

Imho - ndbd should have checked BackupDataDir for dir or aborted for invalid dir