| Bug #57936 | Assertion `! is_set()' failed after out of disk space writing binlog | ||
|---|---|---|---|
| Submitted: | 2 Nov 2010 20:01 | Modified: | 14 Jan 2013 21:14 |
| Reporter: | Sven Sandberg | Email Updates: | |
| Status: | Can't repeat | Impact on me: | |
| Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
| Version: | 5.1+ | OS: | Any |
| Assigned to: | Matthew Lord | CPU Architecture: | Any |
| Tags: | binlog, Diagnostics_area::set_ok_status, disk full | ||
[2 Nov 2010 20:12]
Sven Sandberg
After you have run the test script above, don't forget to unmount that file system.
[14 Jan 2013 21:07]
Matthew Lord
I couldn't repeat the crash with 5.5.29. I created the loop device of 100K on /tmp/binlogs and just altered my existing test MySQL 5.5 setup to use log-bin=/tmp/binlogs. Once I started mysqld, then I filled up that device with: dd if=/dev/zero of=/tmp/binlogs/waste.bin bs=1024 count=999999 I then ran several inserts into a test.loadtest table. After a few of them, then I got the expected SQL layer hang and the disk full messages in the MySQL error log. I then removed the /tmp/binlogs file and then the SQL query returned and the binary log was updated. I got no assert. I will now test this with 5.1.69.
[14 Jan 2013 21:14]
Matthew Lord
I was not able to repeat the crash using 5.1.65-advanced, either. I used the same 100K loop device on /tmp/binlogs. I then started my existing 5.1 test instance, modifying: log-bin=/tmp/binlogs/foo I then filled up the device with dd: dd if=/dev/zero of=/tmp/binlogs/waste.bin bs=1024 count=999999 I then created some dummy records in a test.test2 table. After several cases then I got the expected messages in the error log and query hang. I then deleted the waste.bin file and things picked up fine again. I was not able to get the failed ASSERT and crash in recent 5.5 or 5.1 Enterprise releases.
[15 Jan 2013 9:31]
Sven Sandberg
I confirm that this seems to have been fixed already. It seems that wait_for_free_space now logs a warning instead of raises an error, so that should avoid the problem.

Description: When the disk becomes full while writing the binlog, it waits for someone to free space: - In MYSQL_LOG::open(), it calls init_io_cache with the MY_WAIT_IF_FULL flag. - When an event is written that is bigger than the remaining disk space, it produces the following call stack: THD::binlog_query MYSQL_BIN_LOG::write Query_log_event::write my_b_safe_write _my_b_write my_write wait_for_free_space my_error my_message_sql Diagnostics_area::set_error_status Thus, it sets the error status in the Diagnostics_area. - Then, it returns to wait_for_free_space and sleeps for 60 seconds. - Then, it returns to my_write and tries to write again, etc. - If someone frees up disk space, the write will succeed and my_write will return success. - The statement continues to execute without problem. - At the end of the statement, when it calls set_ok_status(), an assert is triggered because the Diagnostics_area status is already set. How to repeat: ==== BEGIN rpl_disk_full_assertion.test ==== --source include/master-slave.inc --source include/have_binlog_format_statement.inc --let $master_dir= `SELECT @@datadir` --let $master_dir= $master_dir/binlogs --let $master_file= $master_dir/bigfile --let $long_string= `SELECT REPEAT('a', 500000)` --eval SELECT REPEAT('a', 750000) INTO DUMPFILE '$master_file' CREATE TABLE t1 (a TEXT(1000000)); --disable_query_log --send --eval INSERT INTO t1 VALUES ('$long_string') --enable_query_log --sleep 5 --remove_file $master_file --reap ==== END ==== ==== BEGIN rpl_disk_full_assertion-master.opt ==== --log-bin=$MYSQLTEST_VARDIR/mysqld.1/data/binlogs/master-bin ==== END ==== ==== BEGIN rpl_disk_full_assertion-master.sh ==== dir=$MYSQLTEST_VARDIR/mysqld.1/data/binlogs img=$dir.fs echo dir=$dir img=$img mkdir $dir head -c 1000000 /dev/zero > $img mkfs.ext2 -F $img sudo mount -o loop $img $dir ==== END ==== 1. copy these scripts to mysql-test/suite/rpl/t 2. run ./mtr rpl_disk_full_assertion 3. wait for > 60 seconds, watch it crash 4. run gdb on the core. It produces this: #0 0x00007fc7d46efcfc in pthread_kill () from /lib/libpthread.so.0 #1 0x0000000000b4ac21 in my_write_core (sig=6) at stacktrace.c:331 #2 0x000000000069f7fb in handle_segfault (sig=6) at mysqld.cc:2609 #3 <signal handler called> #4 0x00007fc7d3ab9ba5 in raise () from /lib/libc.so.6 #5 0x00007fc7d3abd6b0 in abort () from /lib/libc.so.6 #6 0x00007fc7d3ab2a71 in __assert_fail () from /lib/libc.so.6 #7 0x0000000000680985 in Diagnostics_area::set_ok_status (this=0x3094040, thd=0x3092988, affected_rows_arg=1, last_insert_id_arg=0, message_arg=0x0) at sql_class.cc:512 #8 0x00000000005a95ea in my_ok (thd=0x3092988, affected_rows=1, id=0, message=0x0) at sql_class.h:2405 #9 0x0000000000753cdc in mysql_insert (thd=0x3092988, table_list=0x307bed0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false) at sql_insert.cc:967 #10 0x00000000006b4fe8 in mysql_execute_command (thd=0x3092988) at sql_parse.cc:3250 #11 0x00000000006bda99 in mysql_parse (thd=0x3092988, rawbuf=0x3148428 "INSERT INTO t1 VALUES ('", 'a' <repeats 176 times>..., length=500026, found_semicolon=0x7fc7d4976960) at sql_parse.cc:6068 #12 0x00000000006afbfa in dispatch_command (command=COM_QUERY, thd=0x3092988, packet=0x30cd3d9 "INSERT INTO t1 VALUES ('", 'a' <repeats 176 times>..., packet_length=500026) at sql_parse.cc:1261 #13 0x00000000006aeb90 in do_command (thd=0x3092988) at sql_parse.cc:889 #14 0x00000000006acdd0 in handle_one_connection (arg=0x3092988) at sql_connect.cc:1136 #15 0x00007fc7d46ea971 in start_thread () from /lib/libpthread.so.0 #16 0x00007fc7d3b6c91d in clone () from /lib/libc.so.6 #17 0x0000000000000000 in ?? () Add DBUG_ASSERT(0) to the end of Diagnostics_area::set_error_status to get a stack trace when it sets the error status for the first time. Suggested fix: I don't know. Perhaps one of: - Make wait_for_free_space clear the diagnostics area again. - Make my_write clear the diagnostics area if it first calls wait_for_free_space and later succeeds to write.