Bug #66683 InnoDB doesn't report DB_OUT_OF_FILE_SPACE during recovery
Submitted: 4 Sep 2012 13:57 Modified: 4 Sep 2012 17:47
Reporter: Baron Schwartz (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.5.27 OS:Any
Assigned to: CPU Architecture:Any

[4 Sep 2012 13:57] Baron Schwartz
Description:
When the main tablespace is full, InnoDB can crash unrecoverably and does not indicate why.

In the scenario I just experienced, the main tablespace was configured to a specific size without autoextend. The tablespace filled up, and the server began reporting "table is full" for various operations. One such was a very long-running INSERT..SELECT into an InnoDB temporary table. This eventually crashed the server (which is probably a separate bug).

On recovery, InnoDB reported the following abbreviated information:

InnoDB: Transaction 1EDC90E0 was in the XA prepared state.
InnoDB: Since innodb_force_recovery > 0, we will rollback it anyway.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 1 row operations to undo
InnoDB: Trx id counter is 1EDC9200
120904  8:31:24  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 158745180, file name mysql-bin.000382
InnoDB: and relay log file
InnoDB: position 0 6871, file name /var/lib/mysql/relay/mysql-relay-bin.000528
InnoDB: Last MySQL binlog file position 0 335181031, file name /var/lib/mysql/relay/mysql-bin.000375
120904  8:31:28  InnoDB: Assertion failure in thread 139739781744384 in file row0mysql.c line 3519
InnoDB: We intentionally generate a memory trap.
[snip]
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x39)[0x8069b9]
/usr/sbin/mysqld(handle_segfault+0x464)[0x512fd4]
/lib64/libpthread.so.0(+0xfa60)[0x7f17b31f3a60]
/lib64/libc.so.6(gsignal+0x35)[0x7f17b2e9ac65]
/lib64/libc.so.6(abort+0x16f)[0x7f17b2e9be3f]
/usr/sbin/mysqld[0x8b0cc7]
/usr/sbin/mysqld[0x8b29d0]
/usr/sbin/mysqld[0x983bc2]
/usr/sbin/mysqld[0x8cee50]
/usr/sbin/mysqld[0x88e0a2]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x8b)[0x6b75db]
/usr/sbin/mysqld[0x5adc9c]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0xac3)[0x5b1f13]
/usr/sbin/mysqld[0x518cc2]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x751)[0x51cc01]
/lib64/libc.so.6(__libc_start_main+0xec)[0x7f17b2e88280]
/usr/sbin/mysqld[0x5114b1]

Running in GDB, I found the following:

(gdb) bt
#0  0x00007ffff6db2c65 in raise () from /lib64/libc.so.6
#1  0x00007ffff6db3e3f in abort () from /lib64/libc.so.6
#2  0x0000000000818514 in row_drop_table_for_mysql (name=0x19e90478 "tmp/#sql381c_12e43_1", trx=0x19f25a28,
    drop_db=<optimized out>) at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/storage/innobase/row/row0mysql.c:3491
#3  0x00000000008196d0 in row_mysql_drop_temp_tables ()
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/storage/innobase/row/row0mysql.c:3577
#4  0x00000000008d47f2 in recv_recovery_rollback_active ()
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/storage/innobase/log/log0recv.c:3325
#5  0x000000000083347b in innobase_start_or_create_for_mysql ()
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/storage/innobase/srv/srv0start.c:1768
#6  0x0000000000805e6d in innobase_init (p=<optimized out>)
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/storage/innobase/handler/ha_innodb.cc:2576
#7  0x0000000000670fa8 in ha_initialize_handlerton (plugin=0xf86968)
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/sql/handler.cc:543
#8  0x00000000005813fa in plugin_initialize (plugin=0xf86968)
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/sql/sql_plugin.cc:1100
#9  0x00000000005858dd in plugin_init (argc=0xf08a08, argv=0xf37468, flags=<optimized out>)
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/sql/sql_plugin.cc:1386
#10 0x0000000000506cbb in init_server_components () at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/sql/mysqld.cc:3887
#11 0x000000000050788a in mysqld_main (argc=69, argv=0xf37468)
    at /pb2/build/sb_0-6421281-1342807431.83/mysql-5.5.27/sql/mysqld.cc:4468
#12 0x00007ffff6da0280 in __libc_start_main () from /lib64/libc.so.6
#13 0x00000000004fd6ca in _start ()

Typing "up" twice, and then "print err", printed 14 which is  DB_OUT_OF_FILE_SPACE. I solved the problem by changing the configuration to let the main tablespace autoextend, but it would be helpful if InnoDB caught this error, or at least reported it before dying, so that it wasn't necessary to run the server in the debugger.

How to repeat:
Limit the main tablespace's size. Run queries.

Suggested fix:
Print a better error message before asserting.
[4 Sep 2012 14:40] MySQL Verification Team
Thank you for the bug report. I got the correct error message before the first "table is full". Please check your log error file:

120904 11:31:00 InnoDB: 1.1.8 started; log sequence number 0
120904 11:31:00 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
120904 11:31:00 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
120904 11:31:00 [Note] Server socket created on IP: '0.0.0.0'.
120904 11:31:01 [Note] Event Scheduler: Loaded 0 events
120904 11:31:01 [Note] 5.5\bin\mysqld: ready for connections.
Version: '5.5.29'  socket: ''  port: 3306  Source distribution
InnoDB: Error: Data file(s) ran out of space.
Please add another data file or use 'autoextend' for the last data file.
120904 11:34:27 [ERROR] 5.5\bin\mysqld: The table 't1' is full
[4 Sep 2012 14:58] Baron Schwartz
Yes, I get that too. The problem is that if you crash MySQL while the tablespace is full and a temp table is open, at least in some cases, InnoDB can't recover. I probably should have been more specific about how to repeat:

1. Fill up the tablespace.
2. Cause MySQL to crash while it tries to insert into an InnoDB temp table.
3. Try to restart MySQL and run recovery.

I am not sure exactly what caused the crash in step #2, but I think there are a lot of unhandled crashing problems when the disk or tablespace is full. Sometimes inserting into a temp table when the tablespace is full will cause an error, sometimes a hard crash. The point of this bug report is that when it's a crash, recovery can fail in a very confusing way.
[4 Sep 2012 15:23] MySQL Verification Team
What size you have used for the tablespace? Thanks.
[4 Sep 2012 15:34] Baron Schwartz
It was set to 1G. I set it to autoextend to 10G to recover from the problem.
[4 Sep 2012 16:59] MySQL Verification Team
Let me provide you with some first findings. First of all, this is not a wild crash , but an abort. Here is where it happens. I have removed lot's of code for brevity and clarity:

        err = que_eval_sql(info, "PROCEDURE DROP_TABLE_PROC () IS\n"
        /* skip  */
                           , FALSE, trx);

        switch (err) {
                ibool           is_temp;
                const char*     name_or_path;
                mem_heap_t*     heap;

        case DB_SUCCESS:
        /*  skip       */
                 default:
                /* No other possible error returns */
                ut_error;  // <<<<<<<<============ THIS IS WHERE ABORT OCCURS .....
        }

So, that means that function que_eval_sql() returned an error that is not handled, although it should have returned  DB_OUT_OF_FILE_SPACE, which is handled properly in this switch statement.
[4 Sep 2012 17:16] MySQL Verification Team
I have tried to repeat a crash with small, fixed-size tablespace and open InnoDB temporary table, but did not succeed.

I think I have found a possible cause of this assert. A cause could be the one described in this bug:

http://bugs.mysql.com/bug.php?id=44781

Hence, bug #44781 should be fixed and it's fix would probably fix this assert as well.
[4 Sep 2012 17:44] Baron Schwartz
Right. Execution should have proceeded to

   case DB_OUT_OF_FILE_SPACE:
      err = DB_MUST_GET_MORE_FILE_SPACE;

      row_mysql_handle_errors(&err, trx, NULL, NULL);

And then into row_mysql_handle_errors(), where we have

   case DB_MUST_GET_MORE_FILE_SPACE:
      fputs("InnoDB: The database cannot continue"
            " operation because of\n"
            "InnoDB: lack of space. You must add"
            " a new data file to\n"
            "InnoDB: my.cnf and restart the database.\n", stderr);

      exit(1);

But that isn't happening for some reason, and InnoDB is asserting. I don't pretend to understand why! Just to be clear, I understand that this isn't a crash, but an abort. The condition that's causing the abort, however, was initially caused by a crash when the tablespace was full:

120904  7:00:04 [ERROR] /usr/sbin/mysqld: The table 'processes' is full
120904  7:00:05  InnoDB: Assertion failure in thread 140102353962752 in file row0mysql.c line 3519
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

Memory status:
Non-mmapped space allocated from system: 428040192
Number of free chunks:                   686
Number of fastbin blocks:                0
Number of mmapped regions:               18
Space in mmapped regions:                600731648
Maximum total allocated space:           0
Space available in freed fastbin blocks: 0
Total allocated space:                   381393120
Total free space:                        46647072
Top-most, releasable space:              32624112
Estimated memory (with thread stack):    1033490432

Events status:
LLA = Last Locked At  LUA = Last Unlocked At
WOC = Waiting On Condition  DL = Data Locked

Event scheduler status:
State      : INITIALIZED
Thread id  : 0
LLA        : n/a:0
LUA        : n/a:0
WOC        : NO
Workers    : 0
Executed   : 0
Data locked: NO

Event queue status:
Element count   : 0
Data locked     : NO
Attempting lock : NO
LLA             : init_queue:139
LUA             : init_queue:149
WOC             : NO
Next activation : never
120904  7:00:05 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=205520896
read_buffer_size=1048576
max_used_connections=80
max_threads=120
thread_count=30
connection_count=30
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 570778 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f6c04214500
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6c1f0a3e88 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x39)[0x8069b9]
/usr/sbin/mysqld(handle_segfault+0x464)[0x512fd4]
/lib64/libpthread.so.0(+0xfa60)[0x7f6fe2594a60]
/lib64/libc.so.6(gsignal+0x35)[0x7f6fe223bc65]
/lib64/libc.so.6(abort+0x16f)[0x7f6fe223ce3f]
/usr/sbin/mysqld[0x8b0cc7]
/usr/sbin/mysqld[0x898ed8]
/usr/sbin/mysqld(_Z15ha_delete_tableP3THDP10handlertonPKcS4_S4_b+0xf9)[0x6bb3d9]
/usr/sbin/mysqld(_Z23mysql_rm_table_no_locksP3THDP10TABLE_LISTbbbb+0x557)[0x60cc67]
/usr/sbin/mysqld(_Z14mysql_rm_tableP3THDP10TABLE_LISTcc+0x196)[0x60d7c6]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1f59)[0x5a3729]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x3b2)[0x5a6902]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1535)[0x5a7f05]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x1ae)[0x5a83ee]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x14f)[0x652cbf]
/usr/sbin/mysqld(handle_one_connection+0x51)[0x652d81]
/lib64/libpthread.so.0(+0x79dc)[0x7f6fe258c9dc]
/lib64/libc.so.6(clone+0x6d)[0x7f6fe22dbdcd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f6c0ca886d0): is an invalid pointer
Connection ID (thread ID): 78627
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Fatal signal 11 while backtracing

The stacktrace is for a different version of the server, not 5.5.27 (and it is not an official Oracle binary but a Gentoo build), but the bug I'm reporting here is about what happens AFTERWARD, when the server tries to recover from the crash and doesn't have any free space in the tablespace to do so. I am afraid that I may be confusing this bug report by discussing what caused the condition InnoDB couldn't recover from, but it seems that you're interested in that information as well.
[4 Sep 2012 17:47] MySQL Verification Team
I am setting this bug to "Verified". But, what is verified is not that InnoDB asserts in case when system tablespace is full and is not auto-extendable (or disk is out of space). Asserting of insufficient space for system tablespace is expected behavior.

I am verifying a bug of not handling the error message from que_eval_sql() function, as described in the bug #44781.

So, this bug can be fixed either by fixing bug #44781 or by adding another case in the switch described above.
[18 Sep 2012 7:25] Pradeep Agrawal
Hi Baron Schwartz,

Please check disk space and swap space of your linux machine.