Bug #56347 temp space filling up
Submitted: 28 Aug 2010 21:07 Modified: 17 Dec 2015 23:44
Reporter: Jamie Koc Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S1 (Critical)
Version:5.0.27-standard, 5.0.91 OS:Linux
Assigned to: Paul DuBois CPU Architecture:Any
Tags: /var/lib/mysql/tmp, tmp space full

[28 Aug 2010 21:07] Jamie Koc
Description:
DBD::mysql::st execute failed: Error writing file '/var/lib/mysql/tmp/MYfQf0bK' (Errcode: 28):

I have an insert..select from query that performs a select and an order by from one table. The table is in archive format and the query causes the /var/lib/mysql/tmp space to completely fill up. The archive table is 5.7G in size and there is about 42G of space before the query starts and the 42G ends being taken.

Is this a known bug and is there a more recent 5.0 version that may address this?

Thanks,
Jamie

How to repeat:
I've tried setting sort_buffer_size to 4G but that did not work.
[29 Aug 2010 10:43] Valeriy Kravchuk
Please, check if the same problem happens with recent 5.0.x version, 5.0.91.
[5 Sep 2010 18:17] Jamie Koc
The same error occurs with version 5.0.91.

I also noticed this same behavior with version 5.1.42-community-log using the outfile. On the server with the 5.1.42 version, I am dumping 250 million records using outfile (table size is 30G) with an select * from table order by. The server starts with 330G free and as I watch the process run, the server space drops down to 130G free. Why does MySQL need so much disk space (200G) to perform a sort on a 30G table?
[16 Sep 2010 13:18] Susanne Ebrecht
How much indexes do you have?
[22 Sep 2010 5:40] Jamie Koc
On the table that has 300 million records, there is one index.
It takes up about 1.8G of space.
[23 Sep 2010 9:04] Susanne Ebrecht
It needs so much space for generating the indexes and the table itself.

Consider, during creating the index and or updating the index sometimes MySQL needs to copy all.

You said you just have one index. I guess its the primary key or?

Consider, every key also is an index.

You easily can test if the problem is related to the index.

Insert the data without that the table has indexes or key. Not even a primary key.

After that alter table and add e.g. primary key or another index or foreign key.

And watch how much space is needed.
[24 Sep 2010 8:00] Jamie Koc
So there are two separate issues here that seem to share the same symptoms.

1. Select into outfile with an order by clause uses approximately 8 times the amount disk space as the table size itself. I have a table with 300 million records (~approx 35G of disk storage) with one index on that table (not a primary key). When I select the data into an outfile, MySQL uses almost 300G of disk space during the process. In checking the processlist, the select does a state=sorting data (which uses 300G), then space is freed, and then the process does state=send data using up disk space again.

2. In my original issue, I have a problem with doing an insert select * from table with an order by. I believe the order by is causing the disk space issue here as well. There are no indexes on the table the data is being inserted into.

Why does MySQL use so much disk space? Is there a config setting that can help here?
[27 Sep 2010 13:11] Susanne Ebrecht
Is your charset utf8?
[27 Sep 2010 17:31] Jamie Koc
DEFAULT CHARSET=latin1
[28 Sep 2010 8:47] Susanne Ebrecht
Please show output of:

SHOW CREATE TABLE <your_table>;

Also the second question I have is why are you using ORDER BY on INSERT?

ORDER BY only makes sense when you want to display the data and not when you just want to use INSERT INTO SELECT. Besides, you want to auto-fill an auto-increment or so.
[28 Sep 2010 19:45] Jamie Koc
Here is the create table statement:

mysql> show create table event_2010_09_28 \G;
*************************** 1. row ***************************
       Table: event_2010_09_28
Create Table: CREATE TABLE `event_2010_09_28` (
  `file_id` int(11) NOT NULL DEFAULT '0',
  `id` bigint(20) NOT NULL DEFAULT '0',
  `cre_date` datetime DEFAULT NULL,
  `record_type_id` tinyint(4) NOT NULL DEFAULT '0',
  `customer_id` varchar(255) DEFAULT NULL,
  `member_id` varchar(10) DEFAULT NULL,
  `cid` varchar(240) DEFAULT NULL,
  `department_id` varchar(240) DEFAULT NULL,
  `member_department_id` int(11) DEFAULT NULL,
  `tid` varchar(245) DEFAULT NULL,
  `sale` decimal(9,2) DEFAULT NULL,
  `total` decimal(10,2) DEFAULT NULL,
  `zip` varchar(12) DEFAULT NULL,
  `free` tinyint(1) DEFAULT '0',
  KEY `idx_event_mid` (`member_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

For #2, the order by is just to have the data pre-sorted in the table.
[29 Sep 2010 15:43] Jamie Koc
Here is the query that uses up 300G of space to execute

SELECT customer_id,record_type_id,member_id,cre_date 
INTO OUTFILE 'test'
FROM event_2010_09_28
WHERE customer_id IS NOT NULL 
AND customer_id NOT like '%-%' 
ORDER BY customer_id, member_id;
[8 Oct 2010 13:18] Valeriy Kravchuk
Please, send the results of:

explain SELECT customer_id,record_type_id,member_id,cre_date 
INTO OUTFILE 'test'
FROM event_2010_09_28
WHERE customer_id IS NOT NULL 
AND customer_id NOT like '%-%' 
ORDER BY customer_id, member_id;

show table status like 'event_2010_09_28'\G
[9 Oct 2010 5:56] Jamie Koc
Explain:
           id: 1
  select_type: SIMPLE
        table: event_2010_09_28
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 319451656
        Extra: Using where; Using filesort

Results of show table status:
           Name: event_2010_09_28
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 319451656
 Avg_row_length: 120
    Data_length: 38555182164
Max_data_length: 281474976710655
   Index_length: 1815251968
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-09-28 00:36:16
    Update_time: 2010-10-01 23:38:40
     Check_time: 2010-09-28 00:36:16
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
[9 Oct 2010 7:23] Valeriy Kravchuk
So, there is a filesort step and you select column defined as VARCHAR(255) among others. Full length is used in temporary files, so this gives us:

mysql> select 255*319451656;
+---------------+
| 255*319451656 |
+---------------+
|   81460172280 |
+---------------+
1 row in set (0.00 sec)

81G at least of temporary files, even without taking other columns into account.

Please, send the results of:

show variables like 'char%';

from your environment.
[10 Oct 2010 6:58] Jamie Koc
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
[11 Oct 2010 15:59] Jamie Koc
Does mysql create the tmp file using 255 characters for each record even though it is a varchar? The maximum length of our customer_id varchar(255) field is 64.
The total size at most would be 319451656*64 = 20444905984 or 20G or is it always going to be 255*319451656 (81G)?
[15 Oct 2010 9:19] Hartmut Holzgraefe
Exactly, filesort files, and temporary in-memory tables, allocated fixed space even for fields declared as VARCHAR or VARBINARY. For MEMORY tables this is documented on 

  http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html

  "MEMORY tables use a fixed-length row-storage format. Variable-length 
   types such as VARCHAR are stored using a fixed length." 

For filesort files i can't find a similar reference right now, but their behavior is similar.

So if your customer_id VARCHAR actually only has values with up to 64 characters max. stored then you should consider reducing the VARCHAR size from 256 to the actually needed 64.

Even more important though: try to avoid execution plans that require a file sort on a large data set, not only for disk space reasons but for general performance considerations.

For the query given here an index on (customer_id, member_id) would avoid the file sort step all together as rows could already be fetched in the requested sorting order right away. This not only saves on temporary disk space and I/O, but also on CPU time and so should speed up that query with its large ordered result set quite a bit.

Be warned though that adding an index on a large table can take some time, and will usually temporarily need up to more than twice as much disk space as the original table. The improvements from the additional index are usually making up for that one time investment pretty quick though.
[30 Apr 2011 0:27] Mark Callaghan
If this is not a bug, then the behavior for on-disk temp tables and filesort files should be documented. Can you provide a reference to that?
[30 Apr 2011 8:50] Sveta Smirnova
Mark, thank you for the feedback.

You are right: there is no info about how information stored in temporary files, so moving bug to "Documentation".
[17 Dec 2015 23:44] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly.

Information about storage format of temporary tables can be found here:

http://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html

(Note that MySQL is more efficient about storage space for temp tables now (as of 5.6) because dynamic-length row storage format is used rather than fixed-length row format.)

Information about how MySQL writes information to files used for the filesort algorithm can be found here:

http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

Again, more recent versions of MySQL (as of 5.7) use more space-efficient storage in many cases.

The increased efficiency with respect to disk space may alleviate the problem of running out of disk in some or many cases.