Description:
Scenario:
1. Master replicates to Slave.
2. Disk becomes full while Master writes the binlog.
3. Master gives the message "Disk is full writing './master-bin.000001' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)".
4. User frees up disk space.
Then the following happens:
5. Master keeps waiting up to 60 seconds.
6. Slave crashes inside the SQL thread with the following stack trace (it was Thread 1 that crashed):
#0 0xb7faf410 in __kernel_vsyscall ()
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8aae7 in pthread_kill () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x08726405 in my_write_core (sig=6) at stacktrace.c:310
#3 0x082c3653 in handle_segfault (sig=6) at mysqld.cc:2536
#4 <signal handler called>
#5 0xb7faf410 in __kernel_vsyscall ()
#6 0xb7de8085 in raise () from /lib/tls/i686/cmov/libc.so.6
#7 0xb7de9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#8 0xb7de110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
#9 0x082aab39 in Diagnostics_area::set_ok_status (this=0xb720126c,
thd=0xb72004b8, affected_rows_arg=0, last_insert_id_arg=0, message_arg=0x0)
at sql_class.cc:436
#10 0x081e223b in my_ok (thd=0xb72004b8, affected_rows=0, id=0, message=0x0)
at sql_class.h:2261
#11 0x082dbfad in mysql_execute_command (thd=0xb72004b8) at sql_parse.cc:4038
#12 0x082df42e in mysql_parse (thd=0xb72004b8, inBuf=0xb7202141 "COMMIT",
length=6, found_semicolon=0xb740710c) at sql_parse.cc:5931
#13 0x083c916a in Query_log_event::do_apply_event (this=0xb7202338,
rli=0x8ad11e0, query_arg=0xb7202141 "COMMIT", q_len_arg=6)
at log_event.cc:3114
#14 0x083c968c in Query_log_event::do_apply_event (this=0xb7202338,
rli=0x8ad11e0) at log_event.cc:2915
#15 0x08471633 in Log_event::apply_event (this=0xb7202338, rli=0x8ad11e0)
at log_event.h:1058
#16 0x0846787a in apply_event_and_update_pos (ev=0xb7202338, thd=0xb72004b8,
rli=0x8ad11e0, skip=true) at slave.cc:2024
#17 0x0846a514 in exec_relay_log_event (thd=0xb72004b8, rli=0x8ad11e0)
at slave.cc:2167
#18 0x0846b45b in handle_slave_sql (arg=0x8acff58) at slave.cc:2891
#19 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#20 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Thread 6 (process 20380):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7e8c881 in select () from /lib/tls/i686/cmov/libc.so.6
#2 0x082c3fd3 in handle_connections_sockets (arg=0x0) at mysqld.cc:4971
#3 0x082c7bf0 in main (argc=7, argv=0xbf913894) at mysqld.cc:4470
Thread 5 (process 20383):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8db1a in do_sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb7f8dbbf in sigwait () from /lib/tls/i686/cmov/libpthread.so.0
#3 0x082c2785 in signal_hand (arg=0x0) at mysqld.cc:2738
#4 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#5 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Thread 4 (process 20445):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x086f807e in vio_read (vio=0x8aaabc8, buf=0x8ae2cf0 "\001", size=4)
at viosocket.c:44
#3 0x082b3514 in my_real_read (net=0x8a88864, complen=0xb749a33c)
at net_serv.cc:815
#4 0x082b3af6 in my_net_read (net=0x8a88864) at net_serv.cc:996
#5 0x082e1474 in do_command (thd=0x8a887e8) at sql_parse.cc:796
#6 0x082ccf9b in handle_one_connection (arg=0x8a887e8) at sql_connect.cc:1115
#7 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Thread 3 (process 20446):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x086f807e in vio_read (vio=0x8aea1f8, buf=0x8afeb38 "\a", size=4)
at viosocket.c:44
#3 0x082b3514 in my_real_read (net=0x8afd1e4, complen=0xb746933c)
at net_serv.cc:815
#4 0x082b3af6 in my_net_read (net=0x8afd1e4) at net_serv.cc:996
#5 0x082e1474 in do_command (thd=0x8afd168) at sql_parse.cc:796
#6 0x082ccf9b in handle_one_connection (arg=0x8afd168) at sql_connect.cc:1115
#7 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Thread 2 (process 20447):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8c99b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x086f807e in vio_read (vio=0x8b18e00, buf=0x8b35c90 "E", size=16384)
at viosocket.c:44
#3 0x086f820d in vio_read_buff (vio=0x8b18e00, buf=0x8b39cb0 "", size=4)
at viosocket.c:83
#4 0x082b3514 in my_real_read (net=0x8af7c30, complen=0xb743824c)
at net_serv.cc:815
#5 0x082b3af6 in my_net_read (net=0x8af7c30) at net_serv.cc:996
#6 0x084cee0c in cli_safe_read (mysql=0x8af7c30) at client.c:670
#7 0x08467e4d in read_event (mysql=0x8af7c30, mi=0x8acff58,
suppress_warnings=0xb7438389) at slave.cc:1834
#8 0x0846f948 in handle_slave_io (arg=0x8acff58) at slave.cc:2524
#9 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#10 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
Thread 1 (process 20448):
#0 0xb7faf410 in __kernel_vsyscall ()
#1 0xb7f8aae7 in pthread_kill () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x08726405 in my_write_core (sig=6) at stacktrace.c:310
#3 0x082c3653 in handle_segfault (sig=6) at mysqld.cc:2536
#4 <signal handler called>
#5 0xb7faf410 in __kernel_vsyscall ()
#6 0xb7de8085 in raise () from /lib/tls/i686/cmov/libc.so.6
#7 0xb7de9a01 in abort () from /lib/tls/i686/cmov/libc.so.6
#8 0xb7de110e in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
#9 0x082aab39 in Diagnostics_area::set_ok_status (this=0xb720126c,
thd=0xb72004b8, affected_rows_arg=0, last_insert_id_arg=0, message_arg=0x0)
at sql_class.cc:436
#10 0x081e223b in my_ok (thd=0xb72004b8, affected_rows=0, id=0, message=0x0)
at sql_class.h:2261
#11 0x082dbfad in mysql_execute_command (thd=0xb72004b8) at sql_parse.cc:4038
#12 0x082df42e in mysql_parse (thd=0xb72004b8, inBuf=0xb7202141 "COMMIT",
length=6, found_semicolon=0xb740710c) at sql_parse.cc:5931
#13 0x083c916a in Query_log_event::do_apply_event (this=0xb7202338,
rli=0x8ad11e0, query_arg=0xb7202141 "COMMIT", q_len_arg=6)
at log_event.cc:3114
#14 0x083c968c in Query_log_event::do_apply_event (this=0xb7202338,
rli=0x8ad11e0) at log_event.cc:2915
#15 0x08471633 in Log_event::apply_event (this=0xb7202338, rli=0x8ad11e0)
at log_event.h:1058
#16 0x0846787a in apply_event_and_update_pos (ev=0xb7202338, thd=0xb72004b8,
rli=0x8ad11e0, skip=true) at slave.cc:2024
#17 0x0846a514 in exec_relay_log_event (thd=0xb72004b8, rli=0x8ad11e0)
at slave.cc:2167
#18 0x0846b45b in handle_slave_sql (arg=0x8acff58) at slave.cc:2891
#19 0xb7f854fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#20 0xb7e93e5e in clone () from /lib/tls/i686/cmov/libc.so.6
How to repeat:
shell$ cd mysql-test
shell$ cat ./error
# IMAGE_SIZE size of partition
# BIGFILE_SIZE size of a big file created initially in the partition
# CHUNK_SIZE size of a string in the table
# ITERATIONS number of times that the string is updated in the table
export IMAGE_SIZE=20900000 BIGFILE_SIZE=16100000 CHUNK_SIZE=1000 ITERATIONS=100
echo ==== Prepare var dir ====
# unmount and remove previous var dir
sudo umount var
rm -rf var
# create image
head -c $IMAGE_SIZE /dev/zero > disk-image-bug32228
mkfs.ext2 -F disk-image-bug32228
# create var dir and mount disk image on it
mkdir var
sudo mount -o loop disk-image-bug32228 var
echo ==== Run test case ====
date
./mysql-test-run.pl rpl_bug32228
date
df
#EOF error
shell$ cat ./free-space
date
df var
rm var/mysqld.1/data/bigfile
df var
#EOF free-space
shell$ cat ./suite/rpl/t/rpl_bug32228.test
source include/master-slave.inc;
let $MYSQLD_DATADIR= `SELECT @@datadir`;
connection master;
use test;
CREATE TABLE t1 (a VARCHAR(10000));
# BIGFILE_SIZE is an environment variable
--exec head -c $BIGFILE_SIZE /dev/zero > $MYSQLD_DATADIR/bigfile
# Set @a and @b to two different 10000 character strings
eval SET @a= REPEAT('a', $CHUNK_SIZE);
eval SET @bb= REPEAT('b', $CHUNK_SIZE);
SET @@GLOBAL.GENERAL_LOG = 0;
eval INSERT INTO t1 VALUES (@bb);
let $x= $ITERATIONS;
while ($x) {
eval UPDATE t1 SET a = @a;
eval UPDATE t1 SET a = @bb;
dec $x;
}
sync_slave_with_master;
SELECT COUNT(*) FROM t1;
#EOF rpl_bug32228.test
shell$ ./error
Then wait until it hangs.
Go to another shell and execute:
shell2$ ./free-space
Wait until it crashes.
For me, it doesn't crash every time, but at least 50% of the times.