Bug #72457 Replication with no tmpdir space can break replication
Submitted: 25 Apr 2014 15:31 Modified: 18 Dec 2014 20:35
Reporter: Jeremy Tinley Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.5.36, 5.6.23 OS:Linux
Assigned to: CPU Architecture:Any
Tags: myisam, replication, tmpdir
Triage: Needs Triage: D2 (Serious)

[25 Apr 2014 15:31] Jeremy Tinley
Description:
When using ROW based replication (or MIXED in cases where it selects ROW as as the method), and have exhausted available space on your tmpdir, DML against MyISAM tables can cause incomplete data to be written to the binary log resulting in replication breakage: 

Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master 

How to repeat:
1) Establish a master-slave relationship between 2 instances of MySQL
2) Set ROW based replication
3) Set tmpdir pointed to a full or low space available volume (I used a tmpfs mount set to 1M)
4) Generate a DML statement against a MyISAM that will exceed the available space in tmpfs (in my case, this was INSERT INTO TABLE t2 SELECT * FROM t1; where t1 had ~300k rows)

The engine will return a 28 (out of space error) but the binlog is partially written to and forces the error 1236 on the replica.

Suggested fix:
Running out of available space in tmpdir should never cause a downstream replica to break. It should return an appropriate error to the client and not write to replication because the master failed to complete the action.
[25 Apr 2014 15:33] Jeremy Tinley
Statement based replication appears to be a viable workaround. I was unable to break replication using SBR.
[30 Apr 2014 19:29] Jeremy Tinley
Amending with new testing. SBR breaks too when it uses tmpdir to write temporary contents before they get appended to the binlog.
[11 Dec 2014 21:25] Dan Smythe
This still exists in 5.5.40.

Some more information:

in sql/log.cc THD::binlog_setup_trx_data() a cache file can be created, set to mysql_tmpdir, which caches transaction data during what appears to be a group commit. Then one thread re-reads this data for write to the binary log. If mysql_tmpdir is full ( in our case, we use a 8GB ramdisk to prevent spurious temporary tables from eating up all available space ), this can be written improperly, or not completely. When the data is reread and applied to the binary log, a corrupt binary log entry is written, breaking replication and requiring a rebuild of the slave.

I'll attach a patch we've created to avert the problem. I've basically added a binlog-tmpdir variable and use that to override the cache file created in THD::binlog_setup_trx_data()

Suggest setting it to datadir, or better yet, the binlog dir.
[11 Dec 2014 21:26] Dan Smythe
binlog_tmpdir patch

Attachment: binlog_tmpdir.patch (application/octet-stream, text), 2.76 KiB.

[18 Dec 2014 17:51] Sveta Smirnova
Dan,

thank you for the contribution. Please sign Oracle Contributor Agreement (OCA) as described in "Contribution" tab and re-send patch via that tab. Otherwise we would not be able to accept it.
[18 Dec 2014 20:35] Sveta Smirnova
Thank you for the feedback.

Verified as described. Debug version of the server crashes with:

mysqld: /home/sveta/src/mysql-git/sql/binlog.cc:5799: int MYSQL_BIN_LOG::do_write_cache(IO_CACHE*): Assertion `!do_checksum || remains == 0' failed.
20:32:47 UTC - mysqld got signal 6 ; 
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to  
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61111 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x2a54a90
Attempting backtrace. You can use the following information to find out 
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fd171b12de0 thread_stack 0x40000
Thread pointer: 0x2a54a90
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fd171b12de0 thread_stack 0x40000
/home/sveta/build/mysql-5.6/bin/mysqld(my_print_stacktrace+0x35)[0xaa9be0]
/home/sveta/build/mysql-5.6/bin/mysqld(handle_fatal_signal+0x3f3)[0x734aef]
/lib64/libpthread.so.0[0x38f3e0f710]
/lib64/libc.so.6(gsignal+0x35)[0x38f3232925]
/lib64/libc.so.6(abort+0x175)[0x38f3234105]
/lib64/libc.so.6[0x38f322ba4e]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x38f322bb10]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG14do_write_cacheEP11st_io_cache+0xdc6)[0xa3a9d2]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG11write_cacheEP3THDP17binlog_cache_data+0x2b9)[0xa3aff9]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN17binlog_cache_data5flushEP3THDPyPb+0x168)[0xa2ed58]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN17binlog_cache_mngr5flushEP3THDPyPb+0x89)[0xa44003]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG19flush_thread_cachesEP3THD+0x45)[0xa3c83d]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG25process_flush_stage_queueEPyPbPP3THD+0x136)[0xa3ca60]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x274)[0xa3d728]
/home/sveta/build/mysql-5.6/bin/mysqld(_ZN13MYSQL_BIN_LOG8rollbackEP3THDb+0x514)[0xa2fdae]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z17ha_rollback_transP3THDb+0x122)[0x646147]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z19trans_rollback_stmtP3THD+0x160)[0x8af2da]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z21mysql_execute_commandP3THD+0x773e)[0x7e891b]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x421)[0x7ebb80]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xc1e)[0x7deed5]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z10do_commandP3THD+0x33e)[0x7de064]
/home/sveta/build/mysql-5.6/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x1b6)[0x7a6069]
/home/sveta/build/mysql-5.6/bin/mysqld(handle_one_connection+0x33)[0x7a5b7f]
/home/sveta/build/mysql-5.6/bin/mysqld(pfs_spawn_thread+0x159)[0xda69e8]
/lib64/libpthread.so.0[0x38f3e079d1]
/lib64/libc.so.6(clone+0x6d)[0x38f32e8b6d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fd140005050): insert into t1 select * from t1
[18 Dec 2014 20:37] Sveta Smirnova
To repeat:

1. Create small fs:

dd if=/dev/zero of=`pwd`/bug64849 bs=1k count=50000
mkdir ./bug64849_data
sudo /sbin/mkfs.ext2 -F `pwd`/bug64849
sudo mount -o loop `pwd`/bug64849 `pwd`/bug64849_data
sudo chown sveta:sveta bug64849_data/

2. Start MTR:

./mtr --start --suite=rpl --mysqld=--tmpdir=/home/sveta/src/bugs/bug72457/bug64849_data rpl_alter &

3. Connect to master and run following queries:

mysql> set binlog_format='row';
Query OK, 0 rows affected (0.00 sec)

mysql> create table t1(f1 text) engine=myisam;
Query OK, 0 rows affected (0.11 sec)

mysql> insert into t1 values(md5(1));
Query OK, 1 row affected (0.00 sec)

mysql> insert into t1 select * from t1;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0
...
mysql> insert into t1 select * from t1;
Query OK, 524288 rows affected (6.10 sec)
Records: 524288  Duplicates: 0  Warnings: 0

mysql> insert into t1 select * from t1;
ERROR 2013 (HY000): Lost connection to MySQL server during query
[18 Dec 2014 20:44] Sveta Smirnova
Version 5.7.6 fails with proper error:

mysql> insert into t1 select * from t1;
Query OK, 524288 rows affected (7.00 sec)
Records: 524288  Duplicates: 0  Warnings: 0

mysql> insert into t1 select * from t1;
ERROR 3 (HY000): Error writing file '/home/sveta/src/bugs/bug72457/bug64849_data/MLAM1Io7' (Errcode: 28 - No space left on device)

Binary log is not corrupted.

In version 5.6:

../../mysql-5.6/bin/mysqlbinlog ../../mysql-5.6/mysql-test/var/mysqld.1/data/master-bin.000001 >bug72457_5.6.sql
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 8190, event_type: 30
WARNING: The range of printed events ends with a row event or a table map event that does not have the STMT_END_F flag set. This might be because the last statement was not fully written to the log, or because you are using a --stop-position or --stop-datetime that refers to an event in the middle of a statement. The event(s) from the partial statement have not been written to output.
[14 Jan 2015 17:27] Dan Smythe
Creating system variable for binlog_tmpdir

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: binlog_tmpdir.patch (application/octet-stream, text), 2.76 KiB.

[8 May 2017 20:13] Jay Edgar
I was able to reproduce this on both 5.6.35 and 5.7.18.

I also believe I've found the problem.  The binlog cache file code has support for handling errors and not writing corrupt entries to the binlog, but it has an oversight that causes this code to not get executed correctly.  Inside sql/binlog.cc you can find places where it calls set_write_error() and then checks check_write_error() as well as some other conditions and if it passes it calls set_incident() to flag the file as containing corrupt data.  The set_write_error() sets the error to one of ER_TRANS_CACHE_FULL, ER_STMT_CACHE_FULL, or ER_ERROR_ON_WRITE and the check_write_error() checks for those three as well as ER_BINLOG_LOGGING_IMPOSSIBLE.

The problem is that check_write_error() only checks the first error.  In the case where we run out of disk space, there is a low level error that is being set first.  Then, when set_write_error() is called, there is already an existing error so the new error does not overwrite the existing one and then check_write_error() does not see the expected value.

My fix involves modifying check_write_error() to check all conditions to see if any of them match - rather than just relying on the first error to match.

--- a/sql/binlog.cc
+++ b/sql/binlog.cc
@@ -3956,15 +3956,22 @@ bool MYSQL_BIN_LOG::check_write_error(THD *thd)
   if (!thd->is_error())
     DBUG_RETURN(checked);

-  switch (thd->get_stmt_da()->sql_errno())
-  {
-    case ER_TRANS_CACHE_FULL:
-    case ER_STMT_CACHE_FULL:
-    case ER_ERROR_ON_WRITE:
-    case ER_BINLOG_LOGGING_IMPOSSIBLE:
-      checked= TRUE;
-    break;
+  // Check all conditions for one that matches the expected error
+  const Sql_condition *err;
+  Diagnostics_area::Sql_condition_iterator it= thd->get_stmt_da()->sql_conditions();
+  while ((err= it++) != nullptr && !checked)
+  {
+    switch (err->get_sql_errno())
+    {
+      case ER_TRANS_CACHE_FULL:
+      case ER_STMT_CACHE_FULL:
+      case ER_ERROR_ON_WRITE:
+      case ER_BINLOG_LOGGING_IMPOSSIBLE:
+        checked= TRUE;
+        break;
+    }
   }
+
   DBUG_PRINT("return", ("checked: %s", YESNO(checked)));
   DBUG_RETURN(checked);
 }
[9 Jul 2017 3:04] Miguel Solorzano
https://bugs.mysql.com/bug.php?id=86991 marked as duplicate of this one.
[10 Jul 2017 16:18] Marcelo Altmann
I would like to add that the issue happens on other storage engines also, not only MyISAM. Check tests case from duplicate bug for more details.
[17 Jul 2017 18:26] Sveta Smirnova
Jay,

please make sure you signed Oracle Contributor Agreement (OCA), otherwise Oracle MySQL developers would not be even allowed to your fix proposal.

Thanks in advance!
[25 Oct 2017 11:04] Sveta Smirnova
With version 5.7 bug is only repeatable if internal_tmp_disk_storage_engine and default_tmp_storage_engine set to InnoDB. If set to MyISAM server crashes or not depending on binlog_error_action option.
[27 Sep 2018 14:34] Yura Sorokin
The patch for 5.5 branch along with rpl_bug72457 MTR test case

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_5.diff (application/octet-stream, text), 5.54 KiB.

[27 Sep 2018 14:35] Yura Sorokin
The patch for 5.6 branch along with rpl_bug72457 MTR test case

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_6.diff (application/octet-stream, text), 5.22 KiB.

[27 Sep 2018 14:36] Yura Sorokin
The patch for 5.7 branch along with rpl_bug72457_myisam and rpl_bug72457_innodb MTR test cases

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_5_7.diff (application/octet-stream, text), 13.01 KiB.

[27 Sep 2018 14:38] Yura Sorokin
The patch for 8.0 branch along with rpl_bug72457_myisam and rpl_bug72457_innodb MTR test cases

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: patch_mysql_bug72457_8_0.diff (application/octet-stream, text), 12.97 KiB.

[27 Sep 2018 14:41] Yura Sorokin
The patches for 5.7 and 8.0 also include the fix for simulating "out of disk space" with 'simulate_tmpdir_partition_full' DBUG directive.

Fixed problem with wrapping 'reinit_io_cache()' inside
'MYSQL_BIN_LOG::do_write_cache()' with 'simulate_file_write_error' DBUG
directive. 'DBUG_SET("-d,simulate_file_write_error")' should be called
not only when 'reinit_io_cache()' fails but when it succeeds as well.
Without this patch, any subsequent call to 'my_write()' ends up with
an infinite loop.