MySQL Bugs: #89272: Binlog and Engine become inconsistent when binlog cache file gets out of space

Bug #89272	Binlog and Engine become inconsistent when binlog cache file gets out of space
Submitted:	16 Jan 2018 22:52	Modified:	17 May 2018 8:39
Reporter:	Yoshinori Matsunobu (OCA)	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	5.6.39	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
In the following conditions, binary logs and storage engines become inconsistent. I could reproduce in 5.6.39.

- GTID is enabled
- Running many inserts in a single transaction. The whole transaction should be bigger than binlog cache size (so that temporary files are created under tmpdir)
- tmpdir gets out of space
- Then, some inserts fail because of errors like "ERROR 3 (HY000): Error writing file '/tmp/ML1pMt11' (Errcode: 28 - No space left on device)". The failed insert itself is rolled back. But other inserts not hitting the error continue in the transaction.
- Applications forcing commit. Then inserts that don't hit out of space get committed to the engine. But nothing is persisted in binlog. So on a replication environment, master has many data but slaves don't have anything.

See how to repeat to reproduce.

Credits to my colleague Jeff Jiang who originally found this problem.

How to repeat:
1. Run MySQL instance with at least the following settings.

default_storage_engine = InnoDB
tmpdir = /tmp
# datadir and log_bin should be in different partitions from tmpdir
datadir = /data/mysql/3306/
log_bin = /binlogs/binary-logs-3306
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_cache_size = 256K

2. Prepare a script that inserts many rows in a single transaction. Here is an example, to insert 10k rows in one transaction.

#!/usr/bin/perl

print "begin;\n";
for(my $i=1; $i<=10000; $i++) {
  print "insert into t1 values ($i, $i, $i);\n";
}
print "commit;\n";

3. Fill up tmpdir partition (e.g. dd command)

4. Run the following queries
create table t1 (id int primary key, value int, value2 int) engine=innodb;
begin;
source /data/insert.sql

# the source command will fail because of out of space. Then force commit like below.

commit;

5. Check binlog and engines.
select count(*) from t1;
-> many rows, like 9996 rows are there.

show master status;
-> nothing in the transaction is persisted -> mismatch

Suggested fix:
- Force to roll back entire transaction, not just a failed statement

When trying to understand what happened underneath, I noticed the following code

https://github.com/mysql/mysql-server/blob/5.6/mysys/mf_iocache.c#L1791-L1806 

There was one INSERT hit ENOSPC, on that INSERT statement, there were two flush attempts: the first flush attempt during failed to flush to disk and moved the pointers in IO Cache so that it seems as if the write happened. The second flush during the rollback of the INSERT statement then simply ignored the previous buffer.

Then finally if we do a COMMIT, it also tries to flush the IO cache for several times, the first one hits the ENOSPC and the follow flushes just finishes without error. So COMMIT also passes, but we effectively discard all binlog generated.

Can we make sure that IO Cache doesn't discard content in this way?

Hello Yoshinori,

Thank you for the report and repeatable steps.

Thanks,
Umesh

Posted by developer:
 
Changelog entry added for MySQL 5.6.41, 5.7.23, and 8.0.12:
When a transaction larger than the binary log transaction cache size (binlog_cache_size) was flushed to a temporary file during processing, and the flush failed due to a lack of space in the temporary directory, the flush error was not handled correctly. No message was written to the error log, and the binary log cache was not cleared after the transaction was rolled back. Now, in this situation, the server takes an appropriate action based on the binlog_error_action setting (shut down the server or halt logging), and writes a message to the error log. When the transaction is rolled back, the server checks for flush errors and clears the binary log cache if any occurred.

The change log description says "When the transaction is rolled back, the server checks for flush errors and clears the binary log cache if any occurred." I wonder why does only ENOSPC end up with flush errors and clears the binary log cached. What about any other disk error? Can this fix be extended for all disk errors?