Bug #74167 call to posix_fallocate from fil_extend_space_to_desired_size fails
Submitted: 1 Oct 2014 1:06 Modified: 22 Oct 2014 13:52
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.7.5 OS:Linux
Assigned to: CPU Architecture:Any
Tags: innodb

[1 Oct 2014 1:06] Mark Callaghan
Description:
The posix_fallocate call in fil_extend_space_to_desired_size from MySQL 5.7.5 is not in 5.6 code. It also fails on my host that uses ext-3, FusionIO and a patched version of Linux 3.10. If I switch from ext-3 to XFS then there is no error.

I don't know whether the problem is my patched kernel or InnoDB. But the 5.7 innodb call to posix_fallocate is new.

I will ask the local kernel team about this.

For EOPNOTSUPP, I found - http://man7.org/linux/man-pages/man2/fallocate.2.html

strace output is:
2088  open("./test1/foo.ibd", O_RDWR)   = 57
2088  fcntl(57, F_SETFL, O_RDONLY|O_DIRECT) = 0
2088  fcntl(57, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
2088  lseek(57, 0, SEEK_CUR)            = 0
2088  lseek(57, 0, SEEK_END)            = 65536
2088  lseek(57, 0, SEEK_SET)            = 0
2088  fallocate(57, 0, 65536, 32768)    = -1 EOPNOTSUPP (Operation not supported)
2088  fstat(57, {st_mode=S_IFREG|0660, st_size=65536, ...}) = 0
2088  fstatfs(57, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=25739594, f_bfree=8990920, f_bavail=7683400, f_files=6545408, f_ffree=5556394, f_fsid={1952629866, -381804078}, f_namelen=255, f_frsize=4096}) = 0
2088  pwrite(57, "\0", 1, 69631)        = -1 EINVAL (Invalid argument)

This is the code:

#if !defined(NO_FALLOCATE) && defined(UNIV_LINUX)
                /* This is required by FusionIO HW/Firmware */
                int     ret = posix_fallocate(node->handle, node_start, len);

                if (ret != 0) {
                        ib::error() <<
                                "posix_fallocate(): Failed to preallocate"
                                " data for file "
                                << node->name << ", desired size "
                                << len << " bytes."
                                " Operating system error number "
                                << ret << ". Check"
                                " that the disk is not full or a disk quota"
                                " exceeded. Some operating system error"
                                " numbers are described at " REFMAN ""
                                " operating-system-error-codes.html";
                        success = false;
                }
#endif /* NO_FALLOCATE || !UNIV_LINUX */

The error message ("table is full") is confusing...

2014-10-01T00:40:57.171913Z 17 [ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./test1/foo.ibd, desired size 32768 bytes. Operating system err
or number 22. Check that the disk is not full or a disk quota exceeded. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ 
operating-system-error-codes.html
2014-10-01T00:40:57.172615Z 17 [ERROR] /data/orig575/sbin/mysqld: The table 'foo' is full

How to repeat:
create a table

Suggested fix:
remove the posix_fallocate call or expect errors
[1 Oct 2014 1:12] Mark Callaghan
Per local kernel guru -- 
"ext3 doesn't support posix_fallocate. glibc used to emulate it by writing a bunch of zeros, but it doesn't support it"
[1 Oct 2014 9:28] zhai weixiang
Found a changelog entry in 5.7.6 that may be related.
http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-6.html

InnoDB: A CREATE TABLE operation would fail with a table is full error when running a MySQL server with innodb_flush_method=O_DIRECT on a Linux system with an ext3 file system. The error is due to an internal posix_fallocate() failure that occurs when O_DIRECT is specified. To allow the file operation to proceed, the internal posix_fallocate() failure now prints an error message to the error log. (Bug #18903979)
[4 Oct 2014 14:35] Mark Callaghan
Thanks for the search result. Sounds like this has been fixed in 5.7.6. The code listed above has this comment that makes me wary. Why do we need FusionIO specific code?
                /* This is required by FusionIO HW/Firmware */
[6 Oct 2014 17:27] Umesh Shastry
Hello Mark,

Thanks for the report.

Indeed, this is duplicate of internal bug "Bug 18903979 - THE TABLE IS FULL ERROR WHEN RUNNING ON EXT3 WITH O_DIRECT" which is fixed as of the upcoming 5.7.6 release.

I'll check internally about the comment and keep you posted further on this.

Thanks,
Umesh
[7 Oct 2014 3:56] Sunny Bains
The FusionIO comment can be ignored. The plan is to use posix_fallocate() instead of writing zeroes for SSDs in general. It just happens to be required for FusionIO HW.

We need to handle the errors properly and fallback to writing zeroes as cunning plan B if the call fails.
[22 Oct 2014 13:52] Erlend Dahl
Fixed in 5.7.6 under the heading of Bug#18903979, as mentioned above.