Bug #29612 Cluster ndbd can't write to file, linux kernel 2.4
Submitted: 7 Jul 2007 11:45 Modified: 18 Sep 2007 10:01
Reporter: Kent Boortz Email Updates:
Status: Won't fix Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S2 (Serious)
Version:5.1.20, 5.2.4 OS:Linux (kernel 2.4)
Assigned to: Stewart Smith CPU Architecture:Any

[7 Jul 2007 11:45] Kent Boortz
Description:
A write by "ndbd" fails, as the file is opened with O_DIRECT,
but the buffer alignment is not correct. The ndb code assumes
that 512 byte alignment is enough, and it is for kernel 2.6,
but not for 2.4 where the alignment has to be the same as the
block size.

While not sure, there are indications that the O_DIRECT flag
might have other problems in the kernel 2.4 series, and from
reading the different "/usr/include/*/fcntl.h" files on a
Red Hat RHAS3 host it is not clear to me if the flag is really
"supported" or not, while obviously it is implemented.

How to repeat:
Run the attached code snippet on a host running kernel 2.4.

Suggested fix:
Always align on block size, and let the option be default only
in builds on kernel 2.6 hosts.
[7 Jul 2007 11:46] Kent Boortz
Snippet that shows the alignment problem

Attachment: o_direct.c (text/plain), 1.48 KiB.

[7 Jul 2007 11:57] Kent Boortz
In strace you can see this

 7017  open("/home/mysqldev/x/mysql-test/var/ndbcluster-9310/ndb_2_fs/D11/DBLQH/S0.FragLog", O_RDWR|O_CREAT|O_TRUNC|O_DIRECT|O_LARGEFILE, 0666) = 19
 .
 .
 7017  _llseek(19, 12550144, [12550144], SEEK_SET) = 0
 7017  write(19, "%\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\0\0\0\0\0"..., 32768) = -1 EINVAL (Invalid argument)

In "ndb_2_trace.log.1"

 --------------- Signal ----------------
 r.bn: 247 "DBLQH", r.proc: 2, r.sigId: 65606 gsn: 271 "FSWRITEREF" prio: 1
 s.bn: 253 "NDBFS", s.proc: 2, s.sigId: 65604 length: 4 trace: 0 #sec: 0 fragInf: 0
  UserPointer: 3
  ErrorCode: 2812, Invalid parameter for file
  OS ErrorCode: 22

In "ndb_2.log" (similar in "ndb_1.log")

 2007-07-07 13:52:06 [ndbd] INFO     -- DBLQH: File system write failed during LogFileOperationRecord state 1. OS errno: 22
 2007-07-07 13:52:06 [ndbd] INFO     -- DBLQH (Line: 12668) 0x00000006
 2007-07-07 13:52:06 [ndbd] INFO     -- Error handler startup shutting down system
 2007-07-07 13:52:06 [ndbd] INFO     -- Angel received ndbd startup failure count 1.
 2007-07-07 13:52:06 [ndbd] INFO     -- Error handler shutdown completed - aborting
 2007-07-07 13:52:06 [ndbd] ALERT    -- Node 2: Forced node shutdown completed. Occured during startphase 4. Initiated by signal 6. Caused by error 2812: 'Invalid parameter for file(Configuration error). Permanent error, external action needed'.

In "ndb_2_error.log" (similar in "ndb_1_error.log")

 Time: Saturday 7 July 2007 - 13:52:06
 Status: Permanent error, external action needed
 Message: Invalid parameter for file (Configuration error)
 Error: 2812
 Error data: DBLQH: File system write failed during LogFileOperationRecord state 1. OS errno: 22
 Error object: DBLQH (Line: 12668) 0x00000006
 Program: /home/mysqldev/x/bin/ndbd
[12 Jul 2007 15:12] Jon Stephens
Per discussion with Trudy and Tomas: Documented that ODirect parameter should be enabled only on Linux kernels 2.6 and newer.
[12 Jul 2007 15:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/30793

ChangeSet@1.2536, 2007-07-12 17:27:53+02:00, tomas@whalegate.ndb.mysql.com +1 -0
  Bug #29612  	Cluster ndbd can't write to file, linux kernel 2.4
  - remove usage for now
[22 Jul 2007 9:43] Bugs System
Pushed into 5.1.21-beta
[18 Sep 2007 10:01] Stewart Smith
The code to make O_DIRECT work on 2.4 on all the filesystems scares young children.

(serisously, it's a page or two of code to get the right parameters for everything).

I vote we just don't support ODirect on 2.4 - just too complex.