MySQL Bugs: #72457: Replication with no tmpdir space can break replication

Bug #72457	Replication with no tmpdir space can break replication
Submitted:	25 Apr 2014 15:31	Modified:	18 Dec 2014 20:35
Reporter:	Jeremy Tinley	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.5.36, 5.6.23	OS:	Linux
Assigned to:		CPU Architecture:	Any
Tags:	myisam, replication, tmpdir

Description:
When using ROW based replication (or MIXED in cases where it selects ROW as as the method), and have exhausted available space on your tmpdir, DML against MyISAM tables can cause incomplete data to be written to the binary log resulting in replication breakage: 

Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master 

How to repeat:
1) Establish a master-slave relationship between 2 instances of MySQL
2) Set ROW based replication
3) Set tmpdir pointed to a full or low space available volume (I used a tmpfs mount set to 1M)
4) Generate a DML statement against a MyISAM that will exceed the available space in tmpfs (in my case, this was INSERT INTO TABLE t2 SELECT * FROM t1; where t1 had ~300k rows)

The engine will return a 28 (out of space error) but the binlog is partially written to and forces the error 1236 on the replica.

Suggested fix:
Running out of available space in tmpdir should never cause a downstream replica to break. It should return an appropriate error to the client and not write to replication because the master failed to complete the action.

Statement based replication appears to be a viable workaround. I was unable to break replication using SBR.

Amending with new testing. SBR breaks too when it uses tmpdir to write temporary contents before they get appended to the binlog.

This still exists in 5.5.40.

Some more information:

in sql/log.cc THD::binlog_setup_trx_data() a cache file can be created, set to mysql_tmpdir, which caches transaction data during what appears to be a group commit. Then one thread re-reads this data for write to the binary log. If mysql_tmpdir is full ( in our case, we use a 8GB ramdisk to prevent spurious temporary tables from eating up all available space ), this can be written improperly, or not completely. When the data is reread and applied to the binary log, a corrupt binary log entry is written, breaking replication and requiring a rebuild of the slave.

I'll attach a patch we've created to avert the problem. I've basically added a binlog-tmpdir variable and use that to override the cache file created in THD::binlog_setup_trx_data()

Suggest setting it to datadir, or better yet, the binlog dir.

binlog_tmpdir patch

Attachment: binlog_tmpdir.patch (application/octet-stream, text), 2.76 KiB.

Dan,

thank you for the contribution. Please sign Oracle Contributor Agreement (OCA) as described in "Contribution" tab and re-send patch via that tab. Otherwise we would not be able to accept it.

Thank you for the feedback.

Verified as described. Debug version of the server crashes with:

mysqld: /home/sveta/src/mysql-git/sql/binlog.cc:5799: int MYSQL_BIN_LOG::do_write_cache(IO_CACHE*): Assertion `!do_checksum || remains == 0' failed.
20:32:47 UTC - mysqld got signal 6 ; 
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to  
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61111 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2a54a90
Attempting backtrace. You can use the following information to find out 
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fd171b12de0 thread_stack 0x40000
Thread pointer: 0x2a54a90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fd171b12de0 thread_stack 0x40000
/home/sveta/build/mysql-5.6/bin/mysqld(my_print_stacktrace+0x35)[0xaa9be0]
/home/sveta/build/mysql-5.6/bin/mysqld(handle_fatal_signal+0x3f3)[0x734aef]
/lib64/libpthread.so.0[0x38f3e0f710]
/lib64/libc.so.6(gsignal+0x35)[0x38f3232925]
/lib64/libc.so.6(abort+0x175)[0x38f3234105]
/lib64/libc.so.6[0x38f322ba4e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x38f322bb10]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG14do_write_cacheEP11st_io_cache+0xdc6)[0xa3a9d2]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG11write_cacheEP3THDP17binlog_cache_data+0x2b9)[0xa3aff9]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN17binlog_cache_data5flushEP3THDPyPb+0x168)[0xa2ed58]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN17binlog_cache_mngr5flushEP3THDPyPb+0x89)[0xa44003]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG19flush_thread_cachesEP3THD+0x45)[0xa3c83d]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG25process_flush_stage_queueEPyPbPP3THD+0x136)[0xa3ca60]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x274)[0xa3d728]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG8rollbackEP3THDb+0x514)[0xa2fdae]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z17ha_rollback_transP3THDb+0x122)[0x646147]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z19trans_rollback_stmtP3THD+0x160)[0x8af2da]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z21mysql_execute_commandP3THD+0x773e)[0x7e891b]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x421)[0x7ebb80]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc1e)[0x7deed5]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z10do_commandP3THD+0x33e)[0x7de064]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x1b6)[0x7a6069]
/home/sveta/build/mysql-5.6/bin/mysqld(handle_one_connection+0x33)[0x7a5b7f]
/home/sveta/build/mysql-5.6/bin/mysqld(pfs_spawn_thread+0x159)[0xda69e8]
/lib64/libpthread.so.0[0x38f3e079d1]
/lib64/libc.so.6(clone+0x6d)[0x38f32e8b6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fd140005050): insert into t1 select * from t1

To repeat:

1. Create small fs:

dd if=/dev/zero of=`pwd`/bug64849 bs=1k count=50000
mkdir ./bug64849_data
sudo /sbin/mkfs.ext2 -F `pwd`/bug64849
sudo mount -o loop `pwd`/bug64849 `pwd`/bug64849_data
sudo chown sveta:sveta bug64849_data/

2. Start MTR:

./mtr --start --suite=rpl --mysqld=--tmpdir=/home/sveta/src/bugs/bug72457/bug64849_data rpl_alter &

3. Connect to master and run following queries:

mysql> set binlog_format='row';
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1(f1 text) engine=myisam;
Query OK, 0 rows affected (0.11 sec)

mysql> insert into t1 values(md5(1));
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 select * from t1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0
...
mysql> insert into t1 select * from t1;
Query OK, 524288 rows affected (6.10 sec)
Records: 524288  Duplicates: 0  Warnings: 0

mysql> insert into t1 select * from t1;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Version 5.7.6 fails with proper error:

mysql> insert into t1 select * from t1;
Query OK, 524288 rows affected (7.00 sec)
Records: 524288  Duplicates: 0  Warnings: 0

mysql> insert into t1 select * from t1;
ERROR 3 (HY000): Error writing file '/home/sveta/src/bugs/bug72457/bug64849_data/MLAM1Io7' (Errcode: 28 - No space left on device)

Binary log is not corrupted.

In version 5.6:

../../mysql-5.6/bin/mysqlbinlog ../../mysql-5.6/mysql-test/var/mysqld.1/data/master-bin.000001 >bug72457_5.6.sql
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 8190, event_type: 30
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.

Creating system variable for binlog_tmpdir (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: binlog_tmpdir.patch (application/octet-stream, text), 2.76 KiB.

I was able to reproduce this on both 5.6.35 and 5.7.18.

I also believe I've found the problem.  The binlog cache file code has support for handling errors and not writing corrupt entries to the binlog, but it has an oversight that causes this code to not get executed correctly.  Inside sql/binlog.cc you can find places where it calls set_write_error() and then checks check_write_error() as well as some other conditions and if it passes it calls set_incident() to flag the file as containing corrupt data.  The set_write_error() sets the error to one of ER_TRANS_CACHE_FULL, ER_STMT_CACHE_FULL, or ER_ERROR_ON_WRITE and the check_write_error() checks for those three as well as ER_BINLOG_LOGGING_IMPOSSIBLE.

The problem is that check_write_error() only checks the first error.  In the case where we run out of disk space, there is a low level error that is being set first.  Then, when set_write_error() is called, there is already an existing error so the new error does not overwrite the existing one and then check_write_error() does not see the expected value.

My fix involves modifying check_write_error() to check all conditions to see if any of them match - rather than just relying on the first error to match.

--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -3956,15 +3956,22 @@ bool MYSQL_BIN_LOG::check_write_error(THD *thd)
   if (!thd->is_error())
     DBUG_RETURN(checked);

-  switch (thd->get_stmt_da()->sql_errno())
-  {
-    case ER_TRANS_CACHE_FULL:
-    case ER_STMT_CACHE_FULL:
-    case ER_ERROR_ON_WRITE:
-    case ER_BINLOG_LOGGING_IMPOSSIBLE:
-      checked= TRUE;
-    break;
+  // Check all conditions for one that matches the expected error
+  const Sql_condition *err;
+  Diagnostics_area::Sql_condition_iterator it= thd->get_stmt_da()->sql_conditions();
+  while ((err= it++) != nullptr && !checked)
+  {
+    switch (err->get_sql_errno())
+    {
+      case ER_TRANS_CACHE_FULL:
+      case ER_STMT_CACHE_FULL:
+      case ER_ERROR_ON_WRITE:
+      case ER_BINLOG_LOGGING_IMPOSSIBLE:
+        checked= TRUE;
+        break;
+    }
   }
+
   DBUG_PRINT("return", ("checked: %s", YESNO(checked)));
   DBUG_RETURN(checked);
 }

https://bugs.mysql.com/bug.php?id=86991 marked as duplicate of this one.

I would like to add that the issue happens on other storage engines also, not only MyISAM. Check tests case from duplicate bug for more details.

Jay,

please make sure you signed Oracle Contributor Agreement (OCA), otherwise Oracle MySQL developers would not be even allowed to your fix proposal.

Thanks in advance!

With version 5.7 bug is only repeatable if internal_tmp_disk_storage_engine and default_tmp_storage_engine set to InnoDB. If set to MyISAM server crashes or not depending on binlog_error_action option.

The patch for 5.5 branch along with rpl_bug72457 MTR test case (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_5.diff (application/octet-stream, text), 5.54 KiB.

The patch for 5.6 branch along with rpl_bug72457 MTR test case (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_6.diff (application/octet-stream, text), 5.22 KiB.

The patch for 5.7 branch along with rpl_bug72457_myisam and rpl_bug72457_innodb MTR test cases (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_7.diff (application/octet-stream, text), 13.01 KiB.

The patch for 8.0 branch along with rpl_bug72457_myisam and rpl_bug72457_innodb MTR test cases (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_8_0.diff (application/octet-stream, text), 12.97 KiB.

The patches for 5.7 and 8.0 also include the fix for simulating "out of disk space" with 'simulate_tmpdir_partition_full' DBUG directive.

Fixed problem with wrapping 'reinit_io_cache()' inside
'MYSQL_BIN_LOG::do_write_cache()' with 'simulate_file_write_error' DBUG
directive. 'DBUG_SET("-d,simulate_file_write_error")' should be called
not only when 'reinit_io_cache()' fails but when it succeeds as well.
Without this patch, any subsequent call to 'my_write()' ends up with
an infinite loop.