Bug #77018 Cluster data nodes should include file name in "could not open file" error msgs
Submitted: 12 May 2015 15:02 Modified: 14 May 2015 14:18
Reporter: Hartmut Holzgraefe Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:ndb-7.2.4; ndb-7.4.4 OS:Linux
Assigned to: CPU Architecture:Any

[12 May 2015 15:02] Hartmut Holzgraefe
Description:
I just wasted more than a day on tracking down a problem on a customer system that lead to node failures with either

  Message: File not found (Ndbd file system inconsistency error, please report a bug)
  Error: 2815
  Error data: DBDICT: File system open failed during FsConnectRecord state 1. OS errno: 2
  Error object: DBDICT (Line: 1058) 0x00000002
  Program: ndbd

or 

  Message: File not found (Ndbd file system inconsistency error, please report a bug)
  Error: 2815
  Error data: DBLQH: File system open failed. OS errno: 2
  Error object: DBLQH (Line: 3643) 0x00000002
  Program: ndbd

which could only be solved by an --initial node restart.

Only after tracking a failed node restart with strace we were able to identify the actual file that couldn't be opened (turned out someone had gzipped some large FragLog files in the cluster data directory for a yet unknown reason).

If the error log entries had shown the problematic file name or path right away this would probably only have been a matter of less than an hour to track down the root cause :(

How to repeat:
Shut down a data node, rename or remove some file in the ndb_#_fs directory that the node will try to open at startup, see how the ndbd process logs a "File system open failed ... OS Errno: 1" error but does not tell WHICH file

Suggested fix:
Include the file name / path in the error message, or if this leads to problems with the fixed size error entry format: at least log an appropriate message in the nodes .out log file that includes the file path
[12 May 2015 15:34] Hartmut Holzgraefe
Also reproduced on Cluster 7.4.4 now
[14 May 2015 14:18] Bogdan Kecman
Hi Hartmut,

verified as reported. I guess this is more FR then a Bug :) but..

all best
Bogdan Kecman