Bug #68737 Data Node Fail DBLQH: File system open failed. OS errno: 2
Submitted: 21 Mar 2013 10:29 Modified: 27 Apr 2016 12:23
Reporter: Ronny Lin Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:7.2.10 OS:Linux (centOS 6.2)
Assigned to: Bogdan Kecman CPU Architecture:Any
Tags: ndbmtd

[21 Mar 2013 10:29] Ronny Lin
Description:
I use mcm 1.2.2 to create mysql cluster with 8 data node in 4 node groups´╝î2 sql node, 2 mgm node on 2 servers.

When I execute SQL like "update TM_ACCOUNT set limit = 10000 where limit < 100 limit 50000", one of data node will fail with error

DBLQH: File system open failed. OS errno: 2

I need to start fail data node manually before i execute new SQL, otherwise the whole cluster will fail.

How to repeat:
1. copy mcm-1.2.2 and tar xvfz
2. create cluster
3. create table with engine = ndb
4. insert 1,000,000 rows into the table
5. execute "update table_name set column1 = 10000 where column1 < 100 limit 50000"
6. return error

Sometimes, the execution will be successful, but if you run 2 or more times, the error will occur.
[21 Mar 2013 10:45] Ronny Lin
ndb_error_reporter out file

Attachment: ndb_error_report_20130321180524.tar.bz2 (application/octet-stream, text), 1.82 MiB.

[21 Mar 2013 12:27] Umesh Shastry
Hello Ronny,

This looks more of like a system problem than a MySQL defect.

$ bin/perror 2
OS error code   2:  No such file or director

$ bin/perror --ndb 2815
NDB error code 2815: Error in reading files, please check file system: Temporary error: Temporary Resource error

Please check that nothing has changed at filesystem level, in particular that no permissions have changed and that the ndb filesystem has not been accidentally deleted or manually altered.

Check to see if you changed anything in config.ini before starting the nodes
Also, make sure the ndbfs is still there.

Running the fsck utility on the filesystem level may help in some situations.

If the error persists, a restart of the datanode (using the option --initial) will solve the problem (note that this assumes that there is at least another node in the same nodegroup is up and from which the crashed node can re-sync).
[21 Mar 2013 12:38] Ronny Lin
I decrease limit number to 2000 for each execution.

I executed for about 150 times, it works fine
[21 Mar 2013 12:43] Ronny Lin
Thanks for your reply.

I changed my server from raid0 to raid1 last night.

I still got this error today.
[27 Apr 2016 12:23] Bogdan Kecman
Not a bug. OS problem. 
Check ulimit. 
Check /etc/security/limits.d and /etc/security/limits.conf
Check syslog

kind regards
arhi