MySQL Bugs: #68737: Data Node Fail DBLQH: File system open failed. OS errno: 2

Bug #68737	Data Node Fail DBLQH: File system open failed. OS errno: 2
Submitted:	21 Mar 2013 10:29	Modified:	27 Apr 2016 12:23
Reporter:	Ronny Lin	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Cluster: Cluster (NDB) storage engine	Severity:	S2 (Serious)
Version:	7.2.10	OS:	Linux (centOS 6.2)
Assigned to:	MySQL Verification Team	CPU Architecture:	Any
Tags:	ndbmtd

Description:
I use mcm 1.2.2 to create mysql cluster with 8 data node in 4 node groups，2 sql node, 2 mgm node on 2 servers.

When I execute SQL like "update TM_ACCOUNT set limit = 10000 where limit < 100 limit 50000", one of data node will fail with error

DBLQH: File system open failed. OS errno: 2

I need to start fail data node manually before i execute new SQL, otherwise the whole cluster will fail.

How to repeat:
1. copy mcm-1.2.2 and tar xvfz
2. create cluster
3. create table with engine = ndb
4. insert 1,000,000 rows into the table
5. execute "update table_name set column1 = 10000 where column1 < 100 limit 50000"
6. return error

Sometimes, the execution will be successful, but if you run 2 or more times, the error will occur.

ndb_error_reporter out file

Attachment: ndb_error_report_20130321180524.tar.bz2 (application/octet-stream, text), 1.82 MiB.

Hello Ronny,

This looks more of like a system problem than a MySQL defect.

$ bin/perror 2
OS error code   2:  No such file or director

$ bin/perror --ndb 2815
NDB error code 2815: Error in reading files, please check file system: Temporary error: Temporary Resource error

Please check that nothing has changed at filesystem level, in particular that no permissions have changed and that the ndb filesystem has not been accidentally deleted or manually altered.

Check to see if you changed anything in config.ini before starting the nodes
Also, make sure the ndbfs is still there.

Running the fsck utility on the filesystem level may help in some situations.

If the error persists, a restart of the datanode (using the option --initial) will solve the problem (note that this assumes that there is at least another node in the same nodegroup is up and from which the crashed node can re-sync).

I decrease limit number to 2000 for each execution.

I executed for about 150 times, it works fine

Thanks for your reply.

I changed my server from raid0 to raid1 last night.

I still got this error today.

Not a bug. OS problem. 
Check ulimit. 
Check /etc/security/limits.d and /etc/security/limits.conf
Check syslog

kind regards
arhi