Bug #109401 Rows_query_log_event truncates query due to treating query as C string
Submitted: 16 Dec 2022 2:01 Modified: 27 Aug 15:37
Reporter: Jeremy Cole (Basic Quality Contributor) (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S3 (Non-critical)
Version:8.0.31, 5.7.40, 8.0.33 OS:Any
Assigned to: CPU Architecture:Any
Tags: binlog, binlog_rows_query_log_events, RBR

[16 Dec 2022 2:01] Jeremy Cole
Description:
When using RBR with the binlog_rows_query_log_events option enabled, SQL
statements are logged in a Rows_query_log_event which is written before the
Table_map_event. Since SQL may contain any binary data including embedded
nulls (\0), it's inappropriate to use null-terminated strings or C string
functions for query strings, and these binary log events are intended to use
the event payload to determine the string length. However, in multiple places
the C strlen() and snprintf() functions are used, causing the query string to
be truncated at the first null byte.

How to repeat:
1. Enable binary logging and binlog_rows_query_log_events.
2. Execute a query containing an embedded null (\0).
3. Observe that the query is truncated in the Rows_query_log_event event.

For example, this can be reproduced based on the following case, creating a test MySQL instance with binlog_rows_query_log_events enabled:

$ dbdeployer deploy single 8.0 --my-cnf-options binlog_rows_query_log_events

A small test program is required to reproduce, because the mysql command doesn't easily allow writing a query with zero bytes in the query string, in this case I used ruby with the mysql2 gem:

$ cat test.rb

#!/usr/bin/env ruby

require "mysql2"

mysql = Mysql2::Client.new(socket: "/tmp/mysql_sandbox8031.sock", username: "msandbox", password: "msandbox")

mysql.query("DROP TABLE IF EXISTS `test`.`t`")
mysql.query("CREATE TABLE `test`.`t` (`name` VARCHAR(255))")
mysql.query("INSERT INTO `test`.`t` (`name`) VALUES (\"foo\0bar\")")

$ ruby test.rb

The truncated query can be seen in the mysqlbinlog output, e.g.:

$ mysqlbinlog -vvv --base64-output=decode-rows binlog.000001

...
# at 8893
#221215 17:53:20 server id 1  end_log_pos 8961 CRC32 0x8a780242 	Rows_query
# INSERT INTO `test`.`t` (`name`) VALUES ("foo
# at 8961
#221215 17:53:20 server id 1  end_log_pos 9012 CRC32 0x200b11e6 	Table_map: `test`.`t` mapped to number 109
# at 9012
#221215 17:53:20 server id 1  end_log_pos 9057 CRC32 0xd6114e06 	Write_rows: table id 109 flags: STMT_END_F
### INSERT INTO `test`.`t`
### SET
###   @1='foo\x00bar' /* VARSTRING(1020) meta=1020 nullable=1 is_null=0 */
...

To be sure that it's actually truncated in the binlog rather than just in mysqlbinlog's reading or printing of the query, you can verify with hexdump that the query itself is truncated:

$ hexdump -C binlog.000001

...
000022c0  63 1d 01 00 00 00 44 00  00 00 01 23 00 00 80 00  |c.....D....#....|
000022d0  2c 49 4e 53 45 52 54 20  49 4e 54 4f 20 60 74 65  |,INSERT INTO `te|
000022e0  73 74 60 2e 60 74 60 20  28 60 6e 61 6d 65 60 29  |st`.`t` (`name`)|
000022f0  20 56 41 4c 55 45 53 20  28 22 66 6f 6f 42 02 78  | VALUES ("fooB.x|
00002300  8a 10 5f 9b 63 13 01 00  00 00 33 00 00 00 34 23  |.._.c.....3...4#|
...

Suggested fix:
Audit all usage of Rows_query_log_event::m_rows_query and ensure it is properly handled in all cases and never treated as a null-terminated C string.
[16 Dec 2022 10:52] MySQL Verification Team
Hello Jeremy,

Thank you for the report and feedback.

Sincerely,
Umesh
[27 Aug 15:37] Jon Stephens
Documented fix as follows in the MySQL 9.0.0 changelog:

    When using row-based replication with
    binlog_rows_query_log_events enabled, SQL statements are logged
    in a Rows_query_log_event which is written before the
    Table_map_event. Since SQL may contain any binary data including
    embedded nulls (\0), using the C strlen() and snprintf() and
    other similar functions which rely on processing up to the null
    byte, it was possible for the query string to be truncated at
    the first null byte, which resulted in processing incomplete
    data.

    We fix this by specifying the length of each
    Rows_query_log_event such that it is no longer necessary to rely
    on C style string methods for length calculations; in addition,
    while processing the query, we now use functions which do not
    rely on null termination. We also mark the first byte of a
    Rows_query_log_event data body as unused.

Closed.