Bug #89272 | Binlog and Engine become inconsistent when binlog cache file gets out of space | ||
---|---|---|---|
Submitted: | 16 Jan 2018 22:52 | Modified: | 17 May 2018 8:39 |
Reporter: | Yoshinori Matsunobu (OCA) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.6.39 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[16 Jan 2018 22:52]
Yoshinori Matsunobu
[16 Jan 2018 23:26]
Zhefu Jiang
When trying to understand what happened underneath, I noticed the following code https://github.com/mysql/mysql-server/blob/5.6/mysys/mf_iocache.c#L1791-L1806 There was one INSERT hit ENOSPC, on that INSERT statement, there were two flush attempts: the first flush attempt during failed to flush to disk and moved the pointers in IO Cache so that it seems as if the write happened. The second flush during the rollback of the INSERT statement then simply ignored the previous buffer. Then finally if we do a COMMIT, it also tries to flush the IO cache for several times, the first one hits the ENOSPC and the follow flushes just finishes without error. So COMMIT also passes, but we effectively discard all binlog generated. Can we make sure that IO Cache doesn't discard content in this way?
[17 Jan 2018 8:42]
MySQL Verification Team
Hello Yoshinori, Thank you for the report and repeatable steps. Thanks, Umesh
[17 May 2018 8:39]
Margaret Fisher
Posted by developer: Changelog entry added for MySQL 5.6.41, 5.7.23, and 8.0.12: When a transaction larger than the binary log transaction cache size (binlog_cache_size) was flushed to a temporary file during processing, and the flush failed due to a lack of space in the temporary directory, the flush error was not handled correctly. No message was written to the error log, and the binary log cache was not cleared after the transaction was rolled back. Now, in this situation, the server takes an appropriate action based on the binlog_error_action setting (shut down the server or halt logging), and writes a message to the error log. When the transaction is rolled back, the server checks for flush errors and clears the binary log cache if any occurred.
[18 Jun 2018 21:07]
Artem Danilov
The change log description says "When the transaction is rolled back, the server checks for flush errors and clears the binary log cache if any occurred." I wonder why does only ENOSPC end up with flush errors and clears the binary log cached. What about any other disk error? Can this fix be extended for all disk errors?