Description:
I found a replica receiver (I/O thread) crash in MySQL 9.7.0 when the replication source sends a malformed but accepted Format_description_event (FDE) whose post_header_len vector is non-empty but shorter than the event type later indexed by the replica I/O thread.
The malformed FDE is accepted because Format_description_event::header_is_valid() only requires common_header_len >= LOG_EVENT_MINIMAL_HEADER_LEN and !post_header_len.empty() (libs/mysql/binlog/event/control_events.h:354-356). The constructor sets number_of_event_types from the available bytes and stores exactly that many entries (libs/mysql/binlog/event/control_events.cpp:243: number_of_event_types = available_bytes; assign(&post_header_len, number_of_event_types);). So a malformed source stream can contain an FDE with as little as one post_header_len byte.
After that FDE is installed as the current source description event, the replica I/O thread has paths that construct event objects (or extract basic info) directly, BEFORE the normal binlog_reader event-type guard. These paths index post_header_len[event_type - 1]:
- ROTATE_EVENT: libs/mysql/binlog/event/control_events.cpp:48
uint8_t post_header_len = fde->post_header_len[ROTATE_EVENT - 1]; // index 3
- QUERY_EVENT: sql/log_event.cc:5081
fd_event->post_header_len[QUERY_EVENT - 1];
- and similarly INCIDENT_EVENT (control_events.cpp:303), XID_EVENT (:330), XA_PREPARE_LOG_EVENT (:342), PREVIOUS_GTIDS_LOG_EVENT (:698).
With a one-byte post_header_len vector these index past the end of the vector. On the official mysql:9.7.0 Docker image, which is built with libstdc++ hardening (_GLIBCXX_ASSERTIONS), the out-of-bounds operator[] is caught and the server aborts:
/opt/rh/gcc-toolset-14/root/usr/include/c++/14/bits/stl_vector.h:1149: ... std::vector<unsigned char>::operator[](size_type) const ...: Assertion '__n < this->size()' failed.
mysqld got signal 6 ;
So on a shipping release build this is a reliable replica I/O-thread abort (denial of service of the replica) driven entirely by the source binlog stream. Without the hardening assertion it is an out-of-bounds read of post_header_len[].
The normal guard exists, but it is downstream of these I/O-thread pre-processing paths:
sql/binlog_reader.cc:196
if (event_type > fde->number_of_event_types && ...) return INVALID_EVENT;
Relationship to an older bug (for due diligence): the same primitive (a post_header_len array shorter than the event type used to index it) was reported long ago as Bug #31581 (2007, 5.1.x). That one was a version-skew case (old 5.1.0-5.1.15 master -> 5.1.16+ slave, ROWS events) fixed narrowly in 5.1.24 by passing explicit event types to the ROWS event constructors. That fix did NOT add general validation that post_header_len covers number_of_event_types, so the weakness remains reachable on current 9.7.0 for a malformed (not merely old) source FDE via the ROTATE/QUERY/INCIDENT/XID/XA_PREPARE/PREVIOUS_GTIDS paths above. I could not find a current (8.0/8.4/9.x) report for this.
Where I think the problem is:
- libs/mysql/binlog/event/control_events.h:354-356 header_is_valid() accepts any non-empty post_header_len.
- libs/mysql/binlog/event/control_events.cpp:243 number_of_event_types = available_bytes; assign(&post_header_len, number_of_event_types);
- the replica I/O thread paths (sql/rpl_replica.cc ROTATE case -> control_events.cpp:48 ; extract_log_event_basic_info -> sql/log_event.cc:5081) index post_header_len[event_type - 1] before sql/binlog_reader.cc:196 guards it.
A fix could validate, in header_is_valid() or when installing the source FDE, that post_header_len is large enough for the event types the I/O thread may decode, or bounds-check each post_header_len[event_type - 1] access on the I/O-thread paths.
This crash depends on the source binlog stream and occurs in the replica receiver thread immediately after the source enters binlog dump and sends the short FDE followed by a ROTATE_EVENT (or QUERY_EVENT). It is not an OOM.
Thank you,
Yakir Gibraltar
How to repeat:
I reproduced this with a small fake MySQL source (a Python script) that performs the classic protocol handshake, answers the replica setup queries (server_id, server_uuid, binlog_checksum, gtid_mode, heartbeat, etc.), then on COM_BINLOG_DUMP sends:
1. A FORMAT_DESCRIPTION_EVENT with binlog_version = 4, server_version = "5.5.0-short-fde", common_header_len = 19, and exactly ONE post_header_len byte.
2. A ROTATE_EVENT.
The replica is an official mysql:9.7.0 Docker container configured to replicate from the fake source:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='<fake-source-host>',
SOURCE_PORT=<fake-source-port>,
SOURCE_USER='root', SOURCE_PASSWORD='...',
SOURCE_AUTO_POSITION=0,
SOURCE_LOG_FILE='mysql-bin.000001', SOURCE_LOG_POS=4, SOURCE_SSL=0;
START REPLICA IO_THREAD;
Failing run (FDE with one post_header_len byte, followed by ROTATE_EVENT):
- container: status=exited exit=2 OOMKilled=false
- mysqld error log:
Replica receiver thread for channel '': connected to source 'root@<host>:<port>' ... Starting replication from file 'mysql-bin.000001', position '4'.
/opt/rh/gcc-toolset-14/root/usr/include/c++/14/bits/stl_vector.h:1149: ... std::vector<unsigned char>::operator[](size_type) const ...: Assertion '__n < this->size()' failed.
mysqld got signal 6 ;
Control run (identical, but a well-formed post_header_len of 4 bytes):
- container stays status=running exit=0 OOMKilled=false; replication ends only on the fake source's deliberate fatal error 1236. This shows the crash is specifically caused by the short post_header_len vector, not by the connection itself.
I also reproduced the same assertion/abort with a following QUERY_EVENT instead of ROTATE_EVENT (the sql/log_event.cc:5081 extract_log_event_basic_info path), with the same status=exited exit=2 and the same operator[] assertion / signal 6.
I can attach the Python reproduction script (a single self-contained file) to this report.
Description: I found a replica receiver (I/O thread) crash in MySQL 9.7.0 when the replication source sends a malformed but accepted Format_description_event (FDE) whose post_header_len vector is non-empty but shorter than the event type later indexed by the replica I/O thread. The malformed FDE is accepted because Format_description_event::header_is_valid() only requires common_header_len >= LOG_EVENT_MINIMAL_HEADER_LEN and !post_header_len.empty() (libs/mysql/binlog/event/control_events.h:354-356). The constructor sets number_of_event_types from the available bytes and stores exactly that many entries (libs/mysql/binlog/event/control_events.cpp:243: number_of_event_types = available_bytes; assign(&post_header_len, number_of_event_types);). So a malformed source stream can contain an FDE with as little as one post_header_len byte. After that FDE is installed as the current source description event, the replica I/O thread has paths that construct event objects (or extract basic info) directly, BEFORE the normal binlog_reader event-type guard. These paths index post_header_len[event_type - 1]: - ROTATE_EVENT: libs/mysql/binlog/event/control_events.cpp:48 uint8_t post_header_len = fde->post_header_len[ROTATE_EVENT - 1]; // index 3 - QUERY_EVENT: sql/log_event.cc:5081 fd_event->post_header_len[QUERY_EVENT - 1]; - and similarly INCIDENT_EVENT (control_events.cpp:303), XID_EVENT (:330), XA_PREPARE_LOG_EVENT (:342), PREVIOUS_GTIDS_LOG_EVENT (:698). With a one-byte post_header_len vector these index past the end of the vector. On the official mysql:9.7.0 Docker image, which is built with libstdc++ hardening (_GLIBCXX_ASSERTIONS), the out-of-bounds operator[] is caught and the server aborts: /opt/rh/gcc-toolset-14/root/usr/include/c++/14/bits/stl_vector.h:1149: ... std::vector<unsigned char>::operator[](size_type) const ...: Assertion '__n < this->size()' failed. mysqld got signal 6 ; So on a shipping release build this is a reliable replica I/O-thread abort (denial of service of the replica) driven entirely by the source binlog stream. Without the hardening assertion it is an out-of-bounds read of post_header_len[]. The normal guard exists, but it is downstream of these I/O-thread pre-processing paths: sql/binlog_reader.cc:196 if (event_type > fde->number_of_event_types && ...) return INVALID_EVENT; Relationship to an older bug (for due diligence): the same primitive (a post_header_len array shorter than the event type used to index it) was reported long ago as Bug #31581 (2007, 5.1.x). That one was a version-skew case (old 5.1.0-5.1.15 master -> 5.1.16+ slave, ROWS events) fixed narrowly in 5.1.24 by passing explicit event types to the ROWS event constructors. That fix did NOT add general validation that post_header_len covers number_of_event_types, so the weakness remains reachable on current 9.7.0 for a malformed (not merely old) source FDE via the ROTATE/QUERY/INCIDENT/XID/XA_PREPARE/PREVIOUS_GTIDS paths above. I could not find a current (8.0/8.4/9.x) report for this. Where I think the problem is: - libs/mysql/binlog/event/control_events.h:354-356 header_is_valid() accepts any non-empty post_header_len. - libs/mysql/binlog/event/control_events.cpp:243 number_of_event_types = available_bytes; assign(&post_header_len, number_of_event_types); - the replica I/O thread paths (sql/rpl_replica.cc ROTATE case -> control_events.cpp:48 ; extract_log_event_basic_info -> sql/log_event.cc:5081) index post_header_len[event_type - 1] before sql/binlog_reader.cc:196 guards it. A fix could validate, in header_is_valid() or when installing the source FDE, that post_header_len is large enough for the event types the I/O thread may decode, or bounds-check each post_header_len[event_type - 1] access on the I/O-thread paths. This crash depends on the source binlog stream and occurs in the replica receiver thread immediately after the source enters binlog dump and sends the short FDE followed by a ROTATE_EVENT (or QUERY_EVENT). It is not an OOM. Thank you, Yakir Gibraltar How to repeat: I reproduced this with a small fake MySQL source (a Python script) that performs the classic protocol handshake, answers the replica setup queries (server_id, server_uuid, binlog_checksum, gtid_mode, heartbeat, etc.), then on COM_BINLOG_DUMP sends: 1. A FORMAT_DESCRIPTION_EVENT with binlog_version = 4, server_version = "5.5.0-short-fde", common_header_len = 19, and exactly ONE post_header_len byte. 2. A ROTATE_EVENT. The replica is an official mysql:9.7.0 Docker container configured to replicate from the fake source: CHANGE REPLICATION SOURCE TO SOURCE_HOST='<fake-source-host>', SOURCE_PORT=<fake-source-port>, SOURCE_USER='root', SOURCE_PASSWORD='...', SOURCE_AUTO_POSITION=0, SOURCE_LOG_FILE='mysql-bin.000001', SOURCE_LOG_POS=4, SOURCE_SSL=0; START REPLICA IO_THREAD; Failing run (FDE with one post_header_len byte, followed by ROTATE_EVENT): - container: status=exited exit=2 OOMKilled=false - mysqld error log: Replica receiver thread for channel '': connected to source 'root@<host>:<port>' ... Starting replication from file 'mysql-bin.000001', position '4'. /opt/rh/gcc-toolset-14/root/usr/include/c++/14/bits/stl_vector.h:1149: ... std::vector<unsigned char>::operator[](size_type) const ...: Assertion '__n < this->size()' failed. mysqld got signal 6 ; Control run (identical, but a well-formed post_header_len of 4 bytes): - container stays status=running exit=0 OOMKilled=false; replication ends only on the fake source's deliberate fatal error 1236. This shows the crash is specifically caused by the short post_header_len vector, not by the connection itself. I also reproduced the same assertion/abort with a following QUERY_EVENT instead of ROTATE_EVENT (the sql/log_event.cc:5081 extract_log_event_basic_info path), with the same status=exited exit=2 and the same operator[] assertion / signal 6. I can attach the Python reproduction script (a single self-contained file) to this report.