Bug #95547 MySQL restarts automatically after ABORT_SERVER due to disk full
Submitted: 27 May 14:37 Modified: 31 May 11:53
Reporter: Przemyslaw Malkowski Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.7.26 OS:Any
Assigned to: CPU Architecture:Any

[27 May 14:37] Przemyslaw Malkowski
Description:
When a disk partition when binary logs are placed is filled, eventually MySQL aborts with:

2019-05-27T14:13:26.411295Z 17 [ERROR] Retry in 60 secs. Message reprinted in 600 secs
2019-05-27T14:21:26.567427Z 17 [ERROR] /usr/sbin/mysqld: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.
14:21:26 UTC - mysqld got signal 6 ;
(...)
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG33handle_binlog_flush_or_sync_errorEP3THDb+0x1bd)[0xef034d]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG14ordered_commitEP3THDbb+0x3ca)[0xef7dfa]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG6commitEP3THDb+0x50d)[0xef855d]

The problem is that after such crash, MySQL service is auto-restarted, which can cause problems and maybe even data damage as recovery cannot finish properly if the disk is still full.

So, we can see these messages later:
mysqld: Error writing file '/data/log/mysql-bin.index_crash_safe' (Errcode: 28 - No space left on device)

as well as further crashes and recovery attempts in loop.

How to repeat:
Create a MySQL instance with small partition dedicated for binlogs only, then simulate some load till this partition gets filled.

Tested with Centos 6.10 and MySQL Community installed from Yum repo:

[root@centos6 ~]# rpm -qa|grep -i mysql
mysql-community-libs-5.7.26-1.el6.x86_64
mysql-community-libs-compat-5.7.26-1.el6.x86_64
mysql-community-server-5.7.26-1.el6.x86_64
mysql80-community-release-el6-3.noarch
mysql-community-client-5.7.26-1.el6.x86_64
mysql-community-common-5.7.26-1.el6.x86_64

Suggested fix:
When MySQL aborts on disk full condition, it shall not attempt to restart automatically. Instead, this should be left down for manual intervention.
[31 May 11:53] Umesh Shastry
Hello Przemyslaw,

Thank you for the report.
Since default value (>= 5.7.7) 	of binlog_error_action is ABORT_SERVER, which makes the server halt logging and shut down whenever it encounters such an error with the binary log. But with mysqld_safe(I assume this in your case) or any other homegrown monitoring process which attempt to restart mysqld, as you rightly pointed out "MySQL service is auto-restarted, which can cause problems and maybe even data damage as recovery cannot finish properly if the disk is still full", or scenarios like our Shane's Bug #88299 etc.

regards,
Umesh