Bug #102514 innodb_flush_method support O_SYNC
Submitted: 7 Feb 2021 3:18 Modified: 12 Feb 2021 12:43
Reporter: haochen he Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: configuration, performance, sync

[7 Feb 2021 3:18] haochen he
Description:
innodb_flush_method has 6 values:
  fsync
  O_DSYNC
  littlesync (testing only)
  nosync (testing only)
  O_DIRECT
  O_DIRECT_NO_FSYNC

by default, MySQL use fsync (i.e., uses the fsync(2) system call to flush both the data and log files).

At the same time, I noticed that innodb_use_native_aio is on by default, using linux aio:
  io_setup(2)
  io_destroy(2)
  io_submit(2)
  io_cancel(2)
  io_getevents(2)

However, it seems that this default behavior of libaio+fsync is not clear in linux:
  [1] https://github.com/axboe/fio/issues/1179
  [2] https://github.com/axboe/fio/blob/fio-3.25/engines/libaio.c#L256-L268

And I noticed that if one wants to safely persist write to the storage medium, (s)he can use RWF_SYNC flag in io_submit(2): https://man7.org/linux/man-pages/man2/io_submit.2.html This is like O_SYNC flag in synchronous I/O (i.e., open(2), write(2)).

Therefore, I am wondering why MySQL does not support O_SYNC? I suppose O_SYNC + wirte(2) is equal to wirte(2) + fsync(2).

How to repeat:
Actually, the reason why I propose this request is that I found the default behavoir of libaio + fsync(2) is very slow in my NVMe SSD (i.e., Samsung 980 pro 500 GB, which is /dev/nvme0n1), even slower than SATA SSD (i.e., Samsung 860 evo, which is /dev/sda).

test command:

 $ fio --filename={/dev/nvme0n1, /dev/sda} --runtime=15 --size=400G --direct=1 --buffered=0 --rw=randwrite --bs=4k --numjobs=16 --iodepth=32 --group_reporting --name=write --ioengine=libaio --fsync={1, 0}

result:
  https://ibb.co/f1VsCC1
  Look at the IOPS comparison between the two devices.
  (ping me if the image is broken)

This may because Samsung handles fsync(2) poorly. However, the default behavior of MySQL do call fsync(2) frequently by setting innodb_doublewrite=1 and innodb_flush_method=fsync as far as I know. As a consequence, if I want to make the best use of my NVMe SSD (Samsung 980 pro), I have to set these two important paramter to unsafe values (0 and O_DIRECT_NO_FSYNC) as another TPC-C benchmark shown: https://ibb.co/b689Sjq (ignore the WesternDigital SN-850 series).

But as the result (https://ibb.co/f1VsCC1) shown, using O_SYNC flag for aio does not have such big negative impact!

Suggested fix:
Support O_SYNC for innodb_flush_method.
[12 Feb 2021 12:43] MySQL Verification Team
Hi Mr. he,

Thank you for your feature request.

We think that it can be considered as a feature request for some future version.

Verified as reported.