MySQL Bugs: #45111: free disk space error using myisamchk

Bug #45111	free disk space error using myisamchk
Submitted:	26 May 2009 23:41	Modified:	20 Aug 2009 9:25
Reporter:	Carsten Ralle	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: MyISAM storage engine	Severity:	S2 (Serious)
Version:	5.0.81	OS:	Linux (2.6.16.60-0.21-smp x86_64)
Assigned to:		CPU Architecture:	Any

Description:
We created a database on machine 1, filled it with data, compressed it and copied the data files over to machine 2. On machine 2 we needed to make changes to the table so we umcompressed it made the changes and tried to re-compress it again, here we ran into the following problem:

although there's plenty of free space the command
myisamchk -rq --sort_buffer_size=256M --key_buffer_size=256M  --read_buffer_size=32M --write_buffer_size=32M --sort-index --analyze --tmpdir=/usr/database/tmp/ /usr/database/T1.MYI

runs all the repairs and then throws the following error:
myisamchk: Disk is full writing '/usr/database/T1.TMM' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              21G  5.5G   15G  28% /
/dev/sda4              13T  2.6T   11T  20% /usr/database

If I copy the same database files to the root partition (same machine, same file system, but partition with less than 1TB in size) the same command runs through without errors !!

How to repeat:
run 
myisamchk -rq /usr/database/T1.MYI 

on a partition with several TB of free space 

Suggested fix:
check free space calculation

Thank you for the report.

myisamchk returns error "Disk is full writing" when either no space left or disk quota reached.

Please check if you have disk quota specified. For example, by running `repquota /usr/database`

We have no quota control installed, system is an out-of-the-box SLES 10.2 with MySQL compiled to allow shared libs (for external functions). I've found none of the quota control programs and neither of the partitons has any quotas enabled.

# mount
/dev/sda2 on /             type reiserfs (rw,acl,user_xattr)
/dev/sda4 on /usr/database type reiserfs (rw,acl,user_xattr)

Anyone ? I would consider this a serious bug which should at least be assigned to someone ...

Thank you for the feedback.

Please check your operating system error logs for filesystem errors. Also, please, provide output of `perror 28` in your environment (or `PATH_TO_MYSQL_INSTALL_DIR/bin/perror 28`)

# perror 28
System error:  28 = No space left on device

In the meantime we had to find a workaround. So we artificially filled the volume with 2 TB files until there were only 1.5 TB free space and than everything worked perfectly. The bug still remains ...

Sorry, I forgot: no filesysten errors whatsoever in the past months (it's a 16TB array and shouldnt give an errors anyway).

Thank you for the feedback.

Please also provide output of:

df -i
stat -f /usr/database

As we need the partition for data recovery the following results are for the current state (with the 4 2TB files to fill the partition), not the original state causing the error.

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              21G  4.1G   16G  21% /
udev                  5.9G  184K  5.9G   1% /dev
/dev/sda4              13T   13T  690G  95% /usr/database

# df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda2                  0       0       0    -  /
udev                 1536566     473 1536093    1% /dev
/dev/sda4                  0       0       0    -  /usr/database

# stat -f /usr/database
  File: "/usr/database"
    ID: 0        Namelen: 255     Type: reiserfs
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 3408989583 Free: 180795568  Available: 180795568
Inodes: Total: 0          Free: 0

Thank you for the feedback.

Is interesting to see output of these commands in when problem is repeatable: I know about problem when user get same error when he was out of inodes

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

I think another detail I've seen in a similar situation may shed some light.
We have a database that's 1.2TB in size, with a corrupted table 706GB in size. Running a recover died with Error 28 writing files to a temp space located on a 2.64GB filesystem. Upon looking further I discovered 2 oddities.

1. A df output of the filesystem used for temp space showed 100% full (264GB), but a du output showed that the size of the directory was only 28k. The path was completely empty... Strange.

2. an lsof output of the root of the temp filesystem (in our case /u02) showed 3 open files used by myisamchk, and all appended with (deleted). Sure enough, the files it was writing to and was waiting to write to, had been deleted. 

No one had logged onto the system so it appears that the files were removed while being written by myisamchk itself. This would certainly explain a similar effect even on larger filesystems, though I would expect it to take longer to get to that state.

See below:

myisamchk: Disk is full writing '/u02/RECOVER_TEMP/STEHxQbZ' (Errcode: 28). Waiting for someone to free space... Retry in 60 secs 

myisamchk 22705 root 7u REG 8,17 85173215232 65306627 /u02/RECOVER_TEMP/STEHxQbZ (deleted) 

This would seem to be a temp file management issue. Anyone else have thoughts?

Jeremy,

thank you for the feedback.

Please check inodes usage with command `df -i`

The problem was definitely lack of inodes, but it was due to failed deletes of files that were being written by myisamchk by myisamchk (see error message and snippet of lsof provided). The question is, "Why did myisamchk remove files it was actively writing?".

Jeremy,

thank you for the feedback.

Please run myisamchk with option -v and provide its full output.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".