Bug #45449 SQL thread crashes on disk full
Submitted: 11 Jun 2009 12:39 Modified: 10 Jan 2013 9:09
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1, 6.0 OS:Linux
Assigned to: Assigned Account CPU Architecture:Any
Tags: coredump, crash, disk full, sql thread
Triage: Triaged: D1 (Critical)

[11 Jun 2009 12:39] Sven Sandberg
Description:
Scenario:
1. Master replicates to Slave.
2. Disk becomes full while Master writes the binlog.
3. Master gives the message "Disk is full writing './master-bin.000001' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)".
4. User frees up disk space.

Then the following happens:
5. Master keeps waiting up to 60 seconds.
6. Slave crashes inside the SQL thread with the following stack trace (it was Thread 1 that crashed):

#0  0xb7faf410 in __kernel_vsyscall ()
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8aae7 in pthread_kill () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08726405 in my_write_core (sig=6) at stacktrace.c:310
#3  0x082c3653 in handle_segfault (sig=6) at mysqld.cc:2536
#4  <signal handler called>
#5  0xb7faf410 in __kernel_vsyscall ()
#6  0xb7de8085 in raise () from /lib/tls/i686/cmov/libc.so.6
#7  0xb7de9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#8  0xb7de110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
#9  0x082aab39 in Diagnostics_area::set_ok_status (this=0xb720126c, 
    thd=0xb72004b8, affected_rows_arg=0, last_insert_id_arg=0, message_arg=0x0)
    at sql_class.cc:436
#10 0x081e223b in my_ok (thd=0xb72004b8, affected_rows=0, id=0, message=0x0)
    at sql_class.h:2261
#11 0x082dbfad in mysql_execute_command (thd=0xb72004b8) at sql_parse.cc:4038
#12 0x082df42e in mysql_parse (thd=0xb72004b8, inBuf=0xb7202141 "COMMIT", 
    length=6, found_semicolon=0xb740710c) at sql_parse.cc:5931
#13 0x083c916a in Query_log_event::do_apply_event (this=0xb7202338, 
    rli=0x8ad11e0, query_arg=0xb7202141 "COMMIT", q_len_arg=6)
    at log_event.cc:3114
#14 0x083c968c in Query_log_event::do_apply_event (this=0xb7202338, 
    rli=0x8ad11e0) at log_event.cc:2915
#15 0x08471633 in Log_event::apply_event (this=0xb7202338, rli=0x8ad11e0)
    at log_event.h:1058
#16 0x0846787a in apply_event_and_update_pos (ev=0xb7202338, thd=0xb72004b8, 
    rli=0x8ad11e0, skip=true) at slave.cc:2024
#17 0x0846a514 in exec_relay_log_event (thd=0xb72004b8, rli=0x8ad11e0)
    at slave.cc:2167
#18 0x0846b45b in handle_slave_sql (arg=0x8acff58) at slave.cc:2891
#19 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#20 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 6 (process 20380):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7e8c881 in select () from /lib/tls/i686/cmov/libc.so.6
#2  0x082c3fd3 in handle_connections_sockets (arg=0x0) at mysqld.cc:4971
#3  0x082c7bf0 in main (argc=7, argv=0xbf913894) at mysqld.cc:4470

Thread 5 (process 20383):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8db1a in do_sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#2  0xb7f8dbbf in sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#3  0x082c2785 in signal_hand (arg=0x0) at mysqld.cc:2738
#4  0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#5  0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 4 (process 20445):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x086f807e in vio_read (vio=0x8aaabc8, buf=0x8ae2cf0 "\001", size=4)
    at viosocket.c:44
#3  0x082b3514 in my_real_read (net=0x8a88864, complen=0xb749a33c)
    at net_serv.cc:815
#4  0x082b3af6 in my_net_read (net=0x8a88864) at net_serv.cc:996
#5  0x082e1474 in do_command (thd=0x8a887e8) at sql_parse.cc:796
#6  0x082ccf9b in handle_one_connection (arg=0x8a887e8) at sql_connect.cc:1115
#7  0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8  0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 3 (process 20446):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x086f807e in vio_read (vio=0x8aea1f8, buf=0x8afeb38 "\a", size=4)
    at viosocket.c:44
#3  0x082b3514 in my_real_read (net=0x8afd1e4, complen=0xb746933c)
    at net_serv.cc:815
#4  0x082b3af6 in my_net_read (net=0x8afd1e4) at net_serv.cc:996
#5  0x082e1474 in do_command (thd=0x8afd168) at sql_parse.cc:796
#6  0x082ccf9b in handle_one_connection (arg=0x8afd168) at sql_connect.cc:1115
#7  0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8  0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 2 (process 20447):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x086f807e in vio_read (vio=0x8b18e00, buf=0x8b35c90 "E", size=16384)
    at viosocket.c:44
#3  0x086f820d in vio_read_buff (vio=0x8b18e00, buf=0x8b39cb0 "", size=4)
    at viosocket.c:83
#4  0x082b3514 in my_real_read (net=0x8af7c30, complen=0xb743824c)
    at net_serv.cc:815
#5  0x082b3af6 in my_net_read (net=0x8af7c30) at net_serv.cc:996
#6  0x084cee0c in cli_safe_read (mysql=0x8af7c30) at client.c:670
#7  0x08467e4d in read_event (mysql=0x8af7c30, mi=0x8acff58, 
    suppress_warnings=0xb7438389) at slave.cc:1834
#8  0x0846f948 in handle_slave_io (arg=0x8acff58) at slave.cc:2524
#9  0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#10 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

Thread 1 (process 20448):
#0  0xb7faf410 in __kernel_vsyscall ()
#1  0xb7f8aae7 in pthread_kill () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08726405 in my_write_core (sig=6) at stacktrace.c:310
#3  0x082c3653 in handle_segfault (sig=6) at mysqld.cc:2536
#4  <signal handler called>
#5  0xb7faf410 in __kernel_vsyscall ()
#6  0xb7de8085 in raise () from /lib/tls/i686/cmov/libc.so.6
#7  0xb7de9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#8  0xb7de110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
#9  0x082aab39 in Diagnostics_area::set_ok_status (this=0xb720126c, 
    thd=0xb72004b8, affected_rows_arg=0, last_insert_id_arg=0, message_arg=0x0)
    at sql_class.cc:436
#10 0x081e223b in my_ok (thd=0xb72004b8, affected_rows=0, id=0, message=0x0)
    at sql_class.h:2261
#11 0x082dbfad in mysql_execute_command (thd=0xb72004b8) at sql_parse.cc:4038
#12 0x082df42e in mysql_parse (thd=0xb72004b8, inBuf=0xb7202141 "COMMIT", 
    length=6, found_semicolon=0xb740710c) at sql_parse.cc:5931
#13 0x083c916a in Query_log_event::do_apply_event (this=0xb7202338, 
    rli=0x8ad11e0, query_arg=0xb7202141 "COMMIT", q_len_arg=6)
    at log_event.cc:3114
#14 0x083c968c in Query_log_event::do_apply_event (this=0xb7202338, 
    rli=0x8ad11e0) at log_event.cc:2915
#15 0x08471633 in Log_event::apply_event (this=0xb7202338, rli=0x8ad11e0)
    at log_event.h:1058
#16 0x0846787a in apply_event_and_update_pos (ev=0xb7202338, thd=0xb72004b8, 
    rli=0x8ad11e0, skip=true) at slave.cc:2024
#17 0x0846a514 in exec_relay_log_event (thd=0xb72004b8, rli=0x8ad11e0)
    at slave.cc:2167
#18 0x0846b45b in handle_slave_sql (arg=0x8acff58) at slave.cc:2891
#19 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#20 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6

How to repeat:
shell$ cd mysql-test
shell$ cat ./error
# IMAGE_SIZE   size of partition
# BIGFILE_SIZE size of a big file created initially in the partition
# CHUNK_SIZE   size of a string in the table
# ITERATIONS   number of times that the string is updated in the table
export IMAGE_SIZE=20900000 BIGFILE_SIZE=16100000 CHUNK_SIZE=1000 ITERATIONS=100

echo ==== Prepare var dir ====

# unmount and remove previous var dir
sudo umount var
rm -rf var

# create image
head -c $IMAGE_SIZE /dev/zero > disk-image-bug32228
mkfs.ext2 -F disk-image-bug32228

# create var dir and mount disk image on it
mkdir var
sudo mount -o loop disk-image-bug32228 var

echo ==== Run test case ====

date
./mysql-test-run.pl rpl_bug32228
date
df
#EOF error

shell$ cat ./free-space
date
df var
rm var/mysqld.1/data/bigfile
df var
#EOF free-space

shell$ cat ./suite/rpl/t/rpl_bug32228.test
source include/master-slave.inc;

let $MYSQLD_DATADIR= `SELECT @@datadir`;
connection master;
use test;
CREATE TABLE t1 (a VARCHAR(10000));

# BIGFILE_SIZE is an environment variable
--exec head -c $BIGFILE_SIZE /dev/zero > $MYSQLD_DATADIR/bigfile

# Set @a and @b to two different 10000 character strings
eval SET @a= REPEAT('a', $CHUNK_SIZE);
eval SET @bb= REPEAT('b', $CHUNK_SIZE);

SET @@GLOBAL.GENERAL_LOG = 0;
eval INSERT INTO t1 VALUES (@bb);
let $x= $ITERATIONS;
while ($x) {
  eval UPDATE t1 SET a = @a;
  eval UPDATE t1 SET a = @bb;
  dec $x;
}

sync_slave_with_master;
SELECT COUNT(*) FROM t1;
#EOF rpl_bug32228.test

shell$ ./error

 Then wait until it hangs.
 Go to another shell and execute:

shell2$ ./free-space

 Wait until it crashes.
 For me, it doesn't crash every time, but at least 50% of the times.
[11 Jun 2009 12:47] Sven Sandberg
Note: to reproduce using the scripts above, you must be root and your os must be Linux. If it doesn't work, try experimenting with the numbers near the top of the "error" file.
[12 Jun 2009 8:30] Susanne Ebrecht
Verified as described
[10 Jan 2013 9:09] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html
[10 Jan 2013 9:10] Jon Stephens
Fixed in 5.1+, documented as follows in the 5.1.68, 5.5.30, 5.6.11, and 5.7.1
changelogs:

        If the disk becomes full while writing to the binary log, the
        server hangs until space is freed up manually. It was possible
        after was done for the MySQL server to fail, due to an internal
        status value being set when not needed. Now in such cases,
        rather than trying to set this status, a warning is written in
        the error log instead.

Closed.