MySQL Bugs: #95023: NDB Node won't start with tablespace (Windows)

Bug #95023	NDB Node won't start with tablespace (Windows)
Submitted:	15 Apr 2019 12:02	Modified:	6 May 2019 11:31
Reporter:	Vincent Guillemette	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	8.0.14-dmr/8.0.16-dmr	OS:	Windows (Server 2016)
Assigned to:		CPU Architecture:	Other (x64)
Tags:	ndb, Tablespace, windows

Description:
Hello, 

As title suggests, after I created a tablespace and associated it to a database table, my ndbmtd processes no longer start. 

I followed this piece of documentation to set up the disk data table : https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-disk-data-objects.html. 

When trying to (re)start any ndb process, it exits, and the following informations are logged : 

---------
ndb_mgm : 
---------
2019-04-15 13:29:48 [MgmtSrvr] INFO     -- Node 25: Node 149 Connected
2019-04-15 13:29:48 [MgmtSrvr] INFO     -- Node 25: Buffering maximum epochs 100
2019-04-15 13:29:48 [MgmtSrvr] INFO     -- Node 25: Start phase 0 completed (system restart)
2019-04-15 13:29:48 [MgmtSrvr] INFO     -- Node 25: Waiting 30 sec for nodes 1 to connect, nodes [ all: 1 and 25 connected: 25 no-wait:  ]
2019-04-15 13:29:48 [MgmtSrvr] INFO     -- Node 25: Communication to Node 1 opened
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Node 1 Connected
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: CM_REGCONF president = 1, own Node = 25, our dynamic id = 0/36
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Node 1: API mysql-8.0.14 ndb-8.0.14
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Start phase 1 completed (system restart)
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Start phase 2 completed (node restart)
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Start phase 3 completed (node restart)
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Start phase 4 completed (node restart)
2019-04-15 13:29:50 [MgmtSrvr] INFO     -- Node 25: Starting to restore schema
2019-04-15 13:29:51 [MgmtSrvr] ALERT    -- Node 25: Forced node shutdown completed. Occured during startphase 5. Caused by error 2812: 'Invalid parameter for file(Configuration error). Permanent error, external action needed'.

--------
ndbmtd : 
--------
Time: Monday 15 April 2019 - 13:29:50
Status: Permanent error, external action needed
Message: Invalid parameter for file (Configuration error)
Error: 2812
Error data: TSMAN: File system close failed. OS errno: 6
Error object: TSMAN (Line: 3914) 0x00000002
Program: ndbmtd.exe
Pid: 4568 thr: 1
Version: mysql-8.0.14 ndb-8.0.14-dmr
Trace file name: ndb_25_trace.log.2_t1
Trace file path: C:/MySQL/Data/cluster-data\ndb_25_trace.log.2 [t1..t4]
***EOM***

------------
config.ini : 
------------
[ndbd default]
NoOfReplicas=2
DataDir=C:/MySQL/Data/cluster-data
DataMemory=1G
ServerPort=50501

[ndb_mgmd default]
ArbitrationRank=0
DataDir=C:/MySQL/Mgmt/cluster-logs

[mysqld default]
ArbitrationRank=0

[ndbd]
NodeId=1
HostName=datab1-vip.xxx

[ndbd]
NodeId=25
HostName=datap1-vip.xxx

[ndb_mgmd]
ArbitrationRank=2
NodeId=101
HostName=mgmtb1-vip.xxx

[ndb_mgmd]
NodeId=125
HostName=mgmtp1-vip.xxx

[ndb_mgmd]
ArbitrationRank=1
NodeId=149
HostName=mgmto1.xxx

[mysqld]
NodeId=201
HostName=datab1-vip.xxx

[mysqld]
NodeId=225
HostName=datap1-vip.xxx

Then, the only way to get the node started is to start from scratch (delete cluster-data contents), which takes a huge amount of time... 

Let me know should you need any more information. 

Regards, 
Vincent. 

How to repeat:
Once a tablespace has been created and associated to a table, try and restart any ndb node.

ndb_trace.log

Attachment: ndb_25_trace.log.2 (application/octet-stream, text), 634.70 KiB.

ndb_trace.log

Attachment: ndb_25_trace.log.2_t1 (application/octet-stream, text), 58.44 KiB.

ndb_trace.log

Attachment: ndb_25_trace.log.2_t2 (application/octet-stream, text), 952.61 KiB.

ndb_trace.log

Attachment: ndb_25_trace.log.2_t3 (application/octet-stream, text), 9.79 KiB.

Hi,

This looks like some windows glitch. Are you sure there's no antivirus running and messing up with disk data?

all best
Bogdan

Hello, 

No antivirus software has been installed on this server. 

I tried disabling Windows Defender's realtime protection, but the problem persists.

Regards, 
Vincent.

Hi,
Weird, I'm not reproducing this. Have you tried starting a node initial?

all best
Bogdan

Hi, 

As stated in my initial post, the node will start with the "initial" flag. 

However, it is not a viable solution as it needs to fetch the whole data set from the other group's node. 

Regards, 
Vincent.

Hi,

> As stated in my initial post, the node will start with the "initial" flag. 

I apologize, I understood that this:

> Then, the only way to get the node started is to start from scratch (delete cluster-data contents), which takes a huge amount of time... 

meant that you are deleting all cluster data, my bad, apologies.

So with --initial you get the node up. 

Let me get this straight, as I'm having issues reproducing this,

- you get cluster up, set it up with data
- you create log file group
- you create tablespace
- you then create new tables with disk data or you alter existing table?

at this point everything works
next time you restart data node it fails to start so you have to start it --initial in order for it to boot up ?

thanks
Bogdan

Hi, 

I'm sorry, you were right. 
The node starts when clearing cluster-data directory. 

You also got the events order right : the table is a newly created one. 

At this point : 
- When I restart data node without any flag, it fails to start. 
- When I restart data node with --initial flag, it fails to start with the following error : 
  Couldn't start as daemon, error: 'Failed to open logfile 'D:/MySQL/Data/cluster-data\ndb_1_out.log' for write, errno: 13'

Regards, 
Vincent.

Hi, 

Nevermind, I started the daemon with the wrong account on my last attempt. 

Starting with the --initial flag produces the same result as clearing the cluster-data directory. 
The data node starts, but fetch the whole data set from the other data node in the group. 

Regards, 
Vincent.

Hi,

Once you start it with initial, will it restart properly next time (no flags) or you need to do initial every time you restart it?

thanks
Bogdan

Hi, 

I need to start the data node with the --initial flag every time. 

Regards, 
Vincent.

Hi, 

If it's of any help, issue persists in version 8.0.16-dmr. 

Regards, 
Vincent.

8.0.16-dmr trace logs

Attachment: mysql-bug-8-0-16.zip (application/x-zip-compressed, text), 79.91 KiB.

Verified on windows only

thanks for submitting the bug

all best
Bogdan

Posted by developer:
 
Should be same bug as fixed  Bug #30484272 	USING DISK DATA ON WINDOWS FAILS NODE RESTART: TSMAN: FILE SYSTEM CLOSE FAILED.