{
  "version": 3,
  "savedAt": "2026-05-10T19:04:54.317Z",
  "fields": {
    "title": "MySQL Community Design Proposal",
    "status": "Draft",
    "roadmap-section": "Other",
    "contact-name": "Vinicius Grippa",
    "contact-email": "vgrippa@gmail.com",
    "company": "Readyset",
    "role": "Lead Database Engineer",
    "authors": "",
    "date": "2026-05-10",
    "target": "",
    "related": "",
    "summary": "Move the binary log out of standalone files and into an InnoDB-managed tablespace, so that data changes and their binlog events are written within the same mini-transaction and made durable by a single redo flush. This eliminates the two-phase commit between InnoDB redo and the binlog, removes one fsync from every durable commit, simplifies crash recovery to a single redo pass, and makes binlog state transactionally consistent with the rows it describes. The replication wire protocol is preserved; consumers do not change.",
    "stories": "As a DBA running a high-write OLTP workload, I want safe replication (sync_binlog=1, innodb_flush_log_at_trx_commit=1) without paying two fsyncs per commit, because the second fsync is currently the dominant commit-path cost on fast NVMe.\nAs an operator recovering from a crash, I want one recovery procedure, not a reconciliation between redo state and binlog state, because the current reconciliation has a long tail of edge cases (prepared-not-logged transactions, partial binlog writes, mid-rotation crashes).\nAs a CDC tool author (Debezium, Maxwell, Readyset, ProxySQL mirror), I want a log that is transactionally consistent with the data on the source, so I never observe an event whose effect is not yet visible (or vice versa) on the source itself.\nAs a backup operator, I want the binlog to be part of the InnoDB durable state, so a consistent backup naturally includes the log up to the same point as the data.",
    "scope-in": "A new binlog storage backend that writes events into an InnoDB tablespace within the same mini-transaction as the data change.\nA new server variable binlog_storage = FILE | INNODB (default FILE).\nPreservation of the existing replication dump protocol and GTID semantics.\nAn export path to the legacy binlog file format for mysqlbinlog and archival.\nA read-only Performance Schema view exposing recent binlog events.\nBackup-tool guidance (XtraBackup, MEB) for the new tablespace.",
    "scope-out": "Removing the legacy FILE backend. FILE remains the default and supported until parity is demonstrated.\nChanges to row-event or statement-event formats. Event payloads stay binary-identical.\nNew replication semantics (multi-source, parallel apply, etc.) beyond what is needed to preserve current behavior.\nEncryption-at-rest changes beyond reusing InnoDB tablespace encryption.",
    "references": "",
    "functional": "The server MUST support a configuration binlog_storage = FILE | INNODB, with FILE as the default.\nWhen binlog_storage = INNODB, binlog events for a transaction MUST be written within the same mini-transaction as the data changes, and made durable by the same redo flush.\nWhen binlog_storage = INNODB, the server MUST NOT perform XA-style two-phase commit between an external binlog and InnoDB redo.\nThe replication dump protocol (COM_BINLOG_DUMP, COM_BINLOG_DUMP_GTID) MUST behave identically from a replica's perspective regardless of the storage backend.\nGTID assignment, ordering, and gtid_executed semantics MUST be preserved.\nAn export utility (or extension of mysqlbinlog) MUST be able to render the InnoDB-backed log as the legacy binary log file format for offline analysis and archival.\nMixed topologies (source on INNODB, replicas on FILE, and vice versa) MUST be supported during migration.",
    "nonfunctional": "Commit-path latency at sync_binlog=1, innodb_flush_log_at_trx_commit=1 SHOULD improve measurably on workloads currently bottlenecked on dual fsync (target: meaningful single-digit-percent or better p99 reduction on sysbench oltp_write_only at moderate concurrency).\nCrash recovery time SHOULD be at most equal to current crash recovery time on equivalent workloads, and SHOULD remove the binlog/redo reconciliation phase from the critical path.\nNo regression in replication throughput or replica lag under steady-state load.\nBackup size impact MUST be predictable and bounded by configured retention.",
    "approach": "Introduce an InnoDB-backed binlog storage backend. When enabled, the binlog writer appends events into an internal InnoDB tablespace inside the same mini-transaction that records the data change. Durability is established by InnoDB's redo flush; there is no separate binlog fsync and no XA between binlog and redo. The dump thread reads from this tablespace instead of from mysql-bin.NNNNNN files; replicas see an identical event stream over the existing wire protocol.\n\nUser Interface\n\nSHOW BINARY LOGS / SHOW MASTER STATUS continue to work; on INNODB, \"file\" is presented as a logical segment identifier rather than a filesystem path.\nPURGE BINARY LOGS becomes a transactional operation over the tablespace; semantics (GTID/time-based purge) are preserved.\nNew: mysqlbinlog --export-from-innodb (or equivalent) to render the InnoDB-backed log as legacy file format.",
    "config-clauses": "(None at the SQL DDL level. No new CREATE clauses.)",
    "sysvars": "binlog_storage (enum: FILE, INNODB; default FILE; not dynamic — set at startup).\nbinlog_innodb_tablespace (string; path / name of the dedicated tablespace; default mysql_binlog).\nbinlog_innodb_retention_bytes and binlog_innodb_retention_seconds mirroring binlog_expire_logs_seconds and max_binlog_size.\nExisting sync_binlog: on INNODB, treated as \"best-effort hint\" since durability is governed by innodb_flush_log_at_trx_commit. Documented mapping required.",
    "utilities": "mysqlbinlog --export-from-innodb=<datadir> to render an InnoDB-backed log as legacy file format from an offline data directory or a connected server.",
    "udfs": "None",
    "statements": "None",
    "observability": "New Performance Schema table: performance_schema.binary_log_events (read-only, ring-buffered view of recent events, with GTID, timestamp, server_id, event type, and size).\nNew status variables: Binlog_innodb_bytes_written, Binlog_innodb_oldest_retained_lsn, Binlog_innodb_dump_lag_bytes.\nExisting Binlog_* status variables preserved with documented meanings under INNODB.",
    "procedure": "Upgrade to a release that supports binlog_storage.\nStop the server cleanly; verify gtid_executed is fully flushed.\nStart with binlog_storage=INNODB; the server creates the binlog tablespace on first start.\nVerify replicas continue to follow GTID stream; no replica-side changes required.\n(Optional) Schedule periodic mysqlbinlog --export-from-innodb runs for archival if downstream tooling expects file format.",
    "security": "",
    "compatibility": "Downgrading: export the InnoDB-backed log to file format, stop the server, set binlog_storage=FILE, restart.",
    "diagram": "+--------------------+         +--------------------------+\n|  SQL layer (InnoDB |         |  Dump thread             |\n|  handler::write_*) |         |  (COM_BINLOG_DUMP_GTID)  |\n+---------+----------+         +-------------+------------+\n          |                                  |\n          v                                  v\n+--------------------+         +--------------------------+\n|  Binlog writer     |         |  Binlog reader           |\n|  (event encode)    |         |  (decode + send)         |\n+---------+----------+         +-------------+------------+\n          |                                  |\n          v                                  v\n+----------------------------------------------------------+\n|  InnoDB mini-transaction layer (mtr_t)                   |\n|     - data page changes                                  |\n|     - binlog event records (new redo log type)           |\n+--------------------------+-------------------------------+\n                           |\n                           v\n                +--------------------------+\n                |  InnoDB redo log         |\n                |  (single fsync = commit) |\n                +--------------------------+\n                           |\n                           v\n                +--------------------------+\n                |  Binlog tablespace       |\n                |  (ordered by LSN/GTID)   |\n                +--------------------------+",
    "interfaces": "Writer interface: binlog_writer::write(event, trx) appends the encoded event to an in-memory log buffer associated with trx. On commit, the buffer is flushed into the binlog tablespace pages within the same mtr_t as the data pages, using a new redo log record type MLOG_BINLOG_EVENT.\nReader interface: binlog_reader::read_from(gtid_or_position) returns a stream of decoded events, backed by an InnoDB cursor over the binlog tablespace ordered by LSN. The existing dump thread is rewired to this interface.\nExport interface: binlog_exporter::to_file(out_stream, range) renders a range to the legacy file format, byte-identical to what the FILE backend would have produced.",
    "implementation": "Define the InnoDB tablespace layout for binlog events: a clustered B-tree keyed by (epoch, lsn) with secondary lookup by GTID.\nAdd MLOG_BINLOG_EVENT redo log record type; integrate writer into the existing commit path so events are part of the trx's mtr.\nRewire the dump thread reader to consume from the tablespace via the new reader interface; gate behind binlog_storage=INNODB.\nImplement binlog_exporter and the mysqlbinlog --export-from-innodb command-line path.\nAdd Performance Schema view and status variables.\nUpdate PURGE BINARY LOGS, SHOW BINARY LOGS, SHOW MASTER STATUS to operate over the tablespace when active.\nTests, docs, upgrade/downgrade paths (Section 6).",
    "qa-notes": "Commit-path microbenchmark. sysbench oltp_write_only at concurrencies 16/64/256, NVMe, sync_binlog=1, innodb_flush_log_at_trx_commit=1. Compare p50/p95/p99 commit latency FILE vs INNODB. Expect measurable p99 improvement on INNODB.\nCrash recovery. kill -9 mysqld during a sustained commit storm; restart and verify (a) recovery is a single redo pass, (b) gtid_executed is consistent, (c) no orphaned-but-prepared transactions exist.\nCDC consistency. External consumer subscribes at GTID G. Verify that for every row visible on the source at G, the consumer eventually sees the corresponding event, and vice versa, with no window where the row is committed but the event is not durable.\nMixed topology replication. Source on INNODB, replica on FILE; reverse; both directions. Run replication for 24h under load; verify zero divergence.\nGroup replication. All members on INNODB, mixed members, rolling upgrade scenarios.\nBackup/restore. XtraBackup and MEB-equivalent flows: take a backup, restore on a fresh instance, verify the restored server resumes replication from a consistent GTID with no missing events.\nmysqlbinlog export. Byte-for-byte equality of --export-from-innodb output vs what the FILE backend would have produced for the same event sequence.\nUpgrade. Start on FILE, switch to INNODB, run, switch back via export, verify no event loss in either direction.\nResource bounds. Verify binlog_innodb_retention_bytes / _seconds correctly bound tablespace growth; verify purge interacts correctly with active dump threads.\nFailure injection. Inject I/O errors during binlog tablespace writes; verify the server fails the transaction cleanly without corrupting redo or data state."
  },
  "checks": [
    "SQL syntax or statements",
    "Configuration options or system variables",
    "Command-line options or utilities",
    "User-visible behavior",
    "Observability",
    "Protocol or replication behavior",
    "Upgrade / downgrade compatibility",
    "Performance or resource usage",
    "Files, persistence, or metadata formats",
    "APIs or internal interfaces",
    "Testing or QA coverage needs"
  ]
}