MySQL Bugs: #118356: MySQL attempts committing transaction not fitting on disk, then crashes.

Bug #118356	MySQL attempts committing transaction not fitting on disk, then crashes.
Submitted:	4 Jun 11:58	Modified:	11 Jun 6:21
Reporter:	Jean-François Gagné	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S2 (Serious)
Version:	8.0.42, 8.4.5 9.3.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

when committing a transaction bigger than binlog_cache_size [1], the binary log temporary file is copied to the binary log.

[1]: https://dev.mysql.com/doc/refman/8.4/en/replication-options-binary-log.html#sysvar_binlog_...

If the disk is filled during this copy, there will be below message in the logs.

2025-04-16T00:06:26.604299Z 20 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000005' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

If we kill the transaction which is blocked in commit, MySQL crashes with below in the logs.

2025-04-16T00:10:04.173842Z 20 [ERROR] [MY-011072] [Server] Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Server is being stopped..
2025-04-16T00:10:04Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=80f73db87df301b4b83d9224cbb052467e9e4c5b
Thread pointer: 0x7ff0a0012760
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7ff0dc7f6bc0 thread_stack 0x100000
 #0 0x103c666 <unknown>
 #1 0x103ca8f <unknown>
 #2 0x2129d39 <unknown>
 #3 0x1d524c9 <unknown>
 #4 0x1d59fdd <unknown>
 #5 0x1d6de66 <unknown>
 #6 0x1d6fa9d <unknown>
 #7 0x1158415 <unknown>
 #8 0xfe9010 <unknown>
 #9 0xec9aae <unknown>
 #10 0xeccdb3 <unknown>
 #11 0xecf7a6 <unknown>
 #12 0xed0385 <unknown>
 #13 0x102ca57 <unknown>
 #14 0x27f8dc4 <unknown>
 #15 0x7ff216ca81c3 start_thread at ./nptl/pthread_create.c:442
 #16 0x7ff216d2885b clone3 at sysdeps/unix/sysv/linux/x86_64/clone3.S:81
 #17 0xffffffffffffffff <unknown>

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7ff0a01e3bc0): UPDATE t SET a = a+1
Connection ID (thread ID): 20
Status: KILL_CONNECTION

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash

Note that stopping MySQL is above situation also crashes MySQL.  There are also probably other actions that would crash MySQL in the above situation, I am not claiming to be exhaustive here.

Nothing described so far is surprizing.  This is the expected behavior with the default value of binlog_error_action [2] (ABORT_SERVER).  See how to repeat for the commands showing above scenario.

[2]: https://dev.mysql.com/doc/refman/8.4/en/replication-options-binary-log.html#sysvar_binlog_...

What is unexpected is that a transaction, which MySQL should know would fill the disk, starts committing.  As an example, if the tmp binlog file is of size 7.6 GiB and there is less than 7.6 GiB of free disk space, MySQL starts committing this transaction, painting itself in a corner (it will fill the disk).  In such a situation, I would expect MySQL to refuse committing the transaction with an error message.  At this point, the user could rollback, or free disk space and attempt committing again.  Said otherwise, MySQL should not paint itself in this known corner / dead-end.  Obviously, if there is enough free disk space when commit starts, and too much disk space is consumed while committing by another process, MySQL will end up painted in the corner, but not its own making.

How to repeat is for 8.4.5, but the behavior is similar for 8.0.42 and 9.3.0.

Reporting this as S2 / Serious, because an avoidable crash is not a minor thing.

See Bug#118332 for an interesting way to avoid this.  Implementing that feature request would solve this bug, but it is probably quicker to fix this bug by failing commit, possibly in a minor release, while that feature request is implemented.

Note that fixing this introduces a situation where COMMIT fails, which is not very common in MySQL (the only other situation I know where commit can fail is on conflict detection in Group Replication).  If this is unwanted, the connection could "just" be violently closed, like the current crash does.

Many thanks for looking into this,

Jean-François Gagné

How to repeat:
# Create a sandbox for our tests.
dbdeployer deploy single mysql_8.4.5

# Creating a table, inserting 1 Mi rows, and growing each row to about 4 KiB.
# (the pv commands are a trick to time command execution)
{
  nb_rows=$((1024*1024))

  ./use <<< "
     CREATE DATABASE test_jfg;
     CREATE TABLE test_jfg.t (id INT AUTO_INCREMENT PRIMARY KEY, a INT DEFAULT 0)"

  seq 1 $nb_rows |
    awk '{print "(null)"}' |
    tr " " "," | paste -s -d "$(printf ',%.0s' {1..100})\n" |
    sed -e 's/.*/INSERT INTO t(id) values &;/' |
    ./use test_jfg | pv -tN insert

  y200="$(yes | head -n 200 | paste -s -d "")"
  y240="$(yes | head -n 240 | paste -s -d "")"
  { echo "ALTER TABLE t ADD COLUMN c0 CHAR(200) DEFAULT '$y200'"
           seq -f " ADD COLUMN c%.0f CHAR(240) DEFAULT '$y240'" 1 15
  } | paste -s -d "," | ./use test_jfg

  ./use test_jfg <<< "ALTER TABLE t FORCE"      | pv -tN alter
  ./use test_jfg <<< "FLUSH TABLE t FOR EXPORT" | pv -tN flush
}
   insert: 0:00:08
    alter: 0:04:00
    flush: 0:00:00

# Here, make sure there is MORE than 8 GiB and LESS than 15 GiB of free disk space.
# (in below post, we saw below UPDATE generates 7.6 GiB of binlogs)
# (https://jfg-mysql.blogspot.com/2025/05/interesting-binlog-optimization-in-mariadb.html)

# Update all rows of the table, filling the disk.
./use test_jfg <<< "UPDATE t SET a = a+1" | pv -tN update

# Wait for the disk to be full before running below.

# Kill the UPDATE query.
./use <<< "SHOW PROCESSLIST" | awk '$0 ~ "UPDATE"{print "kill " $1 ";"}' | ./use

# MySQL crashed.
grep -B 2 -A 2 "MY-011072" data/msandbox.err
2025-06-02T18:16:31.247369Z 14 [ERROR] [MY-000035] [Server] Disk is full writing './binlog.000001' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.
2025-06-02T18:17:02.249380Z 14 [ERROR] [MY-010907] [Server] Error writing file 'binlog' (errno: 28 - No space left on device)
2025-06-02T18:17:02.309586Z 14 [ERROR] [MY-011072] [Server] Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Server is being stopped..
2025-06-02T18:17:02Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Suggested fix:
Before attempting to commit a transaction, check that there is enough free disk space for commit to succeed.   If there is not, the transaction should be rolled back in auto-commit, and kept open in explicit commit (the user can then either rollback, or free disk space and attempt commit again).

Note that naively checking available disk space on commit might incur a performance penalty.  To avoid such a penalty, a new global variable could be introduced to perform the check only for transactions larger than a certain size (suggested name: binlog_trx_size_for_free_disk_space_check).  To make sure transactions smaller than the threshold never fill the disk, a file of that size could be created on startup, and the file deleted when commit fills the disk.  Once the file is deleted, available disk check would be done on each commit, until there is enough disk space to re-create the file.

As written in the description, note that fixing this introduces a situation where COMMIT fails, which is not very common in MySQL (the only other situation I know where commit can fail is on conflict detection in Group Replication).  If this is unwanted, the connection could "just" be violently closed, like the current crash does.

Also, not explicitly reported, below in the log is false, maybe avoid confusing the operator with references to "bug" and "malfunctioning hardware"

Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.

Note that I blogged about this, link below.

https://jfg-mysql.blogspot.com/2025/06/interesting-troubleshooting-crash-filling-then-free...

Hello Jean-François,

Thank you for the report and feedback.
My apologies, was caught up in other bugs and I missed this. 
I'll get back to you if anything needed on this.

regards,
Umesh

Hello Jean-François,

Thank you for the feedback and detailed steps to reproduce.
Verified as described.

regards,
Umesh