Description:
innodb_flush_method has 6 values:
fsync
O_DSYNC
littlesync (testing only)
nosync (testing only)
O_DIRECT
O_DIRECT_NO_FSYNC
by default, MySQL use fsync (i.e., uses the fsync(2) system call to flush both the data and log files).
At the same time, I noticed that innodb_use_native_aio is on by default, using linux aio:
io_setup(2)
io_destroy(2)
io_submit(2)
io_cancel(2)
io_getevents(2)
However, it seems that this default behavior of libaio+fsync is not clear in linux:
[1] https://github.com/axboe/fio/issues/1179
[2] https://github.com/axboe/fio/blob/fio-3.25/engines/libaio.c#L256-L268
And I noticed that if one wants to safely persist write to the storage medium, (s)he can use RWF_SYNC flag in io_submit(2): https://man7.org/linux/man-pages/man2/io_submit.2.html This is like O_SYNC flag in synchronous I/O (i.e., open(2), write(2)).
Therefore, I am wondering why MySQL does not support O_SYNC? I suppose O_SYNC + wirte(2) is equal to wirte(2) + fsync(2).
How to repeat:
Actually, the reason why I propose this request is that I found the default behavoir of libaio + fsync(2) is very slow in my NVMe SSD (i.e., Samsung 980 pro 500 GB, which is /dev/nvme0n1), even slower than SATA SSD (i.e., Samsung 860 evo, which is /dev/sda).
test command:
$ fio --filename={/dev/nvme0n1, /dev/sda} --runtime=15 --size=400G --direct=1 --buffered=0 --rw=randwrite --bs=4k --numjobs=16 --iodepth=32 --group_reporting --name=write --ioengine=libaio --fsync={1, 0}
result:
https://ibb.co/f1VsCC1
Look at the IOPS comparison between the two devices.
(ping me if the image is broken)
This may because Samsung handles fsync(2) poorly. However, the default behavior of MySQL do call fsync(2) frequently by setting innodb_doublewrite=1 and innodb_flush_method=fsync as far as I know. As a consequence, if I want to make the best use of my NVMe SSD (Samsung 980 pro), I have to set these two important paramter to unsafe values (0 and O_DIRECT_NO_FSYNC) as another TPC-C benchmark shown: https://ibb.co/b689Sjq (ignore the WesternDigital SN-850 series).
But as the result (https://ibb.co/f1VsCC1) shown, using O_SYNC flag for aio does not have such big negative impact!
Suggested fix:
Support O_SYNC for innodb_flush_method.