MySQL Bugs: #98448: Please make running MySQL with sync

Bug #98448	Please make running MySQL with sync_binlog != 1 safer.
Submitted:	31 Jan 2020 11:42	Modified:	2 Mar 2020 15:01
Reporter:	Jean-François Gagné	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Replication	Severity:	S4 (Feature request)
Version:	5.7, 8.0	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Hi,

This feature request is made in the context of my FOSDEM talk "The consequences of sync_binlog != 1" ([1]).

[1]: https://fosdem.org/2020/schedule/event/sync_binlog/

Obviously, running MySQL with sync_binlog != 1 (and with innodb_flush_log_at_trx_commit != 1) is not "safe" from a transaction durability point of view. However, combined with replication, running the "MySQL Replicated Distributed System" in a safe way is the everyday challenge of most DBAs.

In my FOSDEM talk, I point-out that with sync_binlog != 1, the binary logs cannot be trusted after an OS crash (but they can be trusted after a mysqld crash). Making the difference between an OS crash and a mysqld crash, and reacting accordingly, is a major challenge for DBAs, and the current implementation of MySQL is making that reaction complicated.

In my FOSDEM talk, I suggest putting: "offline_mode = ON" in MySQL configuration to avoid slaves and client reconnecting to a master after a crash. This is needed for an OS crash, but it is not needed for a mysqld crash. One way MySQL could be easier to run/operate is that after an OS crash, and when combined with sync_binlog != 1, the server would restart as offline, letting the DBA decide how to safely move forward from there (IMHO the best way forward is failing-over to a slave).

Also, because GTID Replication is not crash safe with sync_binlog != 1 (Bug#70659 and Bug#92109), it is not "safe" to have replication automatically start after an OS crash combined with sync_binlog != 1. What I suggest in my FOSDEM talk it to set "skip-slave-start" in MySQL configuration and to do some voodoo operations to restart replication (this avoids restoring from a backup). One way MySQL could be easier to run/operate is that after an OS crash, and when combined with sync_binlog != 1, replication should not automatically start, and maybe those voodoo operations should be executed by the server itself.

My FOSDEM slides should be online in [1] soon, more details in there (I will also add a comment in the bug with a direct link to the slides).

Many thanks for looking into that, JFG

How to repeat:
N/A to this feature request.

Suggested fix:
1. After an OS crash combined with sync_binlog != 1, the server should automatically restart with offline_mode = ON.

2. Until replication is crash safe with GTID and sync_binlog != 1 and after an OS crash combined with sync_binlog != 1, the server should automatically restart with offline_mode = ON.

Hello Jean-François,

Thank you for the feature request!

regards,
Umesh

The way to salvage a GTID slave running with sync_binlog != 1 is described in [1], and it involves starting with skip-slave-start.

https://www.slideshare.net/JeanFranoisGagn/the-consequences-of-syncbinlog-1/25