Bug #33048 Not able to recover binary/blob data correctly using mysqlbinlog
Submitted: 6 Dec 2007 21:18 Modified: 24 Sep 2011 15:52
Reporter: Mark Leith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S1 (Critical)
Version:5.0-BK, 5.1.40sp1 OS:Any
Assigned to: Libing Song CPU Architecture:Any
Tags: addslashes, BLOB, mysql_escape_string, mysqlbinlog, recovery

[6 Dec 2007 21:18] Mark Leith
Description:
It is not possible to recovery BLOB data from binary logs using mysqlbinlog, as null characters (\0) etc. are not escaped properly. 

How to repeat:
drop database if exists binary_files;
create database binary_files;
use binary_files;
create table binfile (bin_data longblob);
set global max_allowed_packet = 50*1024*1024;

/* create php file with the following */
<?php
    $db = mysql_connect("localhost", "root", "");
    mysql_select_db("binary_files", $db) or die(mysql_errno() . ": " . mysql_error());

    $data = addslashes(fread(fopen("/usr/local/mysql/bin/mysqld", "r"), 
         filesize("/usr/local/mysql/bin/mysqld")));
    $sql = "INSERT INTO binfile ";
    $sql .= "(bin_data) ";
    $sql .= "VALUES ('$data')";
    $result = mysql_query($sql, $db) or die(mysql_errno() . ": " . mysql_error());
    mysql_close();
?>

/* run it */
php loadfile.php

use binary_files;
select bin_data into dumpfile '/tmp/mysqld' from binfile;

chmod 777 /tmp/mysqld
/tmp/mysqld --version
rm /tmp/mysqld

mysqlbinlog <binarylog name> | mysql -u root 

use binary_files;
select bin_data into dumpfile '/tmp/mysqld' from binfile;

chmod 777 /tmp/mysqld
/tmp/mysqld --version

Suggested fix:
Properly escape all binary data.
[6 Dec 2007 21:20] Mark Leith
On the first run, checking the /tmp/mysqld file:

medusa:/Users/markleith/mysql/issues/csc21540 root# chmod 777 /tmp/mysqld
medusa:/Users/markleith/mysql/issues/csc21540 root# /tmp/mysqld --version
/tmp/mysqld  Ver 5.0.48-enterprise-gpl-log for apple-darwin8.5.1 on i686 (MySQL Enterprise Server (GPL))

On the second run, after recovering from the binary logs with mysqlbinlog:

medusa:/Users/markleith/mysql/issues/csc21540 root# chmod 777 /tmp/mysqld
medusa:/Users/markleith/mysql/issues/csc21540 root# /tmp/mysqld --version
su: /tmp/mysqld: Malformed Mach-o file
[26 Feb 2008 19:32] Chad MILLER
patch that adds test case

Attachment: 33048testcase.udiff (application/octet-stream, text), 2.45 KiB.

[26 Feb 2008 19:44] Chad MILLER
Using mysql_escape_string() here instead of PHP's addslashes() avoids this problem, and is arguably more correct.

Since we don't explicitly forbid use of literal newlines and literal carriage returns in strings, this bug should be fixed in the server also.
[26 Feb 2008 21:37] Chad MILLER
Ah, this isn't a bug in the server.  It's in the "mysql" command-line client, where it's impossible to insert literal newlines (and carriage returns?) in a string, as those are interpreted at a higher level to be line delimiters.

This test case bypasses "mysql" and accesses the API directly for insertion.  Then it extracts that query from the binlog and tries to run it through "mysql", where literal newlines are not (so far) legal.

I don't think this should be a v5.1-blocking bug.

Immediate workaround:

- Insert values with newlines and carriage returns written as "\n" and "\r".  See my previous comment to this bug.

Potential solutions to make this bug disappear:

- Make the API refuse to insert unencoded special values, or
- Interpret and re-encode strings when reading from the binlog and make them into a form that the command-line client can use, or
- (my favorite) Add a state machine into the command line client "mysql", and accumulate "lines" and line endings when we're inside a string.  Making this work the same for interactive (libreadline and libedit) and for noninteractive use will be tricky, I think.
[27 Feb 2008 15:50] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43076

ChangeSet@1.2580, 2008-02-27 10:49:48-05:00, cmiller@zippy.cornsilk.net +3 -0
  Bug#33048: Not able to recover binary/blob data correctly using \
  	mysqlbinlog
  
  mysqlbinlog wrote to stdout query text exactly as sent to the server,
  but some queries that can be inserted via the API are illegal or 
  impossible to send using the command-line "mysql" client (which is
  why we have escape characters for special characters).
  
  Now, make mysqlbinlog escape query characters (only CR for now) that 
  cannot be read via mysql client.
[27 Feb 2008 16:06] Jim Winstead
identified some issues in the patch via irc.
[27 Feb 2008 16:40] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43084

ChangeSet@1.2580, 2008-02-27 11:38:48-05:00, cmiller@zippy.cornsilk.net +3 -0
  Bug#33048: Not able to recover binary/blob data correctly using \
  	mysqlbinlog
  
  mysqlbinlog wrote to stdout query text exactly as sent to the server,
  but some queries that can be inserted via the API are illegal or 
  impossible to send using the command-line "mysql" client (which is
  why we have escape characters for special characters).
  
  Now, make mysqlbinlog escape query characters (only CR for now) that 
  cannot be read via mysql client.
[27 Feb 2008 22:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/43119

ChangeSet@1.2580, 2008-02-27 17:15:53-05:00, cmiller@zippy.cornsilk.net +3 -0
  Bug#33048: Not able to recover binary/blob data correctly using \
  	mysqlbinlog
  
  mysqlbinlog wrote to stdout query text exactly as sent to the server,
  but some queries that can be inserted via the API are illegal or 
  impossible to send using the command-line "mysql" client.
  
  Now, do not consume carriage returns in a CR+NL sequence for 
  noninteractive commands.
[28 Feb 2008 15:52] Chad MILLER
Thanks to jimw and bar for reviewing.

I do not see a good way to fix this bug.

In replacing carriage returns with backslash-R, we risk cleaving a multibyte character.  So, the first patches, 
http://lists.mysql.com/commits/43084
are unsafe.

In making mysql client not consume carriage returns, then Windows installations (and dumps from Windows machines?) would inject carriage returns that interfere with parsing.  Thus, my second approach, 
http://lists.mysql.com/commits/43119
breaks other things.  See Bug#19799.

The only correct behavior for this bug is to escape special characters in when inserting them.  In the case of this user, use mysql_escape_string() instead of PHP's addslashes() .
[14 Mar 2008 23:12] Chad MILLER
I'm investigating this further.
[19 Mar 2008 16:47] Chad MILLER
patch; demonstration of incorrectness

Attachment: 33048-demonstration-of-incorrectness.patch (text/x-patch), 9.10 KiB.

[19 Mar 2008 19:04] Chad MILLER
Reconfirmed.  There's no bug here except that it's possible to construct non-prepared-statement queries with the API that are impossible to construct with the command line.

Using the API, it is possible to construct queries which contain in their string fields literal characters like NUL, newline, and carriage return, which the command line client may interpret specially.  One must ALWAYS escape some characters in strings when constructing a query:  Backslash and the character used to quote the string.  

Since some processing is always necessary, it is not a burden to require escaping the characters that are special for mysql client.

The correct way to use the server and client is to 
1) use prepared statements, or
2) construct queries using our supplied functions that escape metacharacters, like mysql_real_escape_string()

http://dev.mysql.com/doc/refman/5.1/en/mysql-real-escape-string.html
[19 Mar 2008 23:36] Kolbe Kegel
If you find yourself in a situation where you need to restore from some binary logs but you are running into problems with improperly escaped data that cannot be handled by the mysql command-line client ("this bug"), follow the steps at the bottom of http://dev.mysql.com/doc/refman/5.0/en/change-master-to.html ...

You want to replay the server's own binary logs (not relay logs, but regular binary logs), named (for example) myhost-bin.*. First, make a backup copy of these binary logs in some safe place, in case you don't exactly follow the procedure below and accidentally have the server purge the binary logs. Use SET GLOBAL relay_log_purge=0 for additional safety. Then start the server without the --log-bin option, Instead, use the --replicate-same-server-id, --relay-log=myhost-bin (to make the server believe that these regular binary logs are relay logs) and --skip-slave-start options. After the server starts, issue these statements:

CHANGE MASTER TO
  RELAY_LOG_FILE='myhost-bin.153',
  RELAY_LOG_POS=410,
  MASTER_HOST='some_dummy_string';
START SLAVE SQL_THREAD;

The server reads and executes its own binary logs, thus achieving crash recovery. Once the recovery is finished, run STOP SLAVE, shut down the server, delete the master.info and relay-log.info files, and restart the server with its original options.

Specifying the MASTER_HOST option (even with a dummy value) is required to make the server think it is a slave.
[22 Feb 2010 18:19] MySQL Verification Team
i read this bug report to mean that what's written on http://dev.mysql.com/doc/refman/5.1/en/point-in-time-recovery.html
is not true in cases where blobs are used and some characters will cause problems?
[2 Mar 2010 1:11] Roel Van de Paar
Trying the workaround described in the manual and copied above, you may run into all sorts of errors like these:

o ERROR 1380 (HY000): Failed initializing relay log position: Could not find target log during relay log initialization
o ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log 
o (In the error log) [ERROR] Error reading master configuration
o (In the error log) [ERROR] Error reading slave log configuration
o (In the error log) [ERROR] Failed to initialize the master info structure
o (In the error log) [ERROR] log .\binlog_file_name listed in the index, but failed to stat
o (In the error log) [ERROR] Error counting relay log space
o (In the error log) [ERROR] Failed to open the relay log 'binlog_file_name' (relay_log_pos x)
o (In the error log) [ERROR] Slave SQL: Error initializing relay log position: Could not find target log during relay log initialization, Error_code: 1593

This may be caused by various items. One such item may be bug #12190 (fixed only in 5.5).

As such, getting replication to work by hacking the server (i.e. not setting up proper replication) seems nearly impossible.

Instead, setup normal replication using a secondary/temporary mysqld (as master), stop such replication, edit the relevant files and then re-start the SQL thread. Still requires a bit of playing around, but much easier (and faster) to get to work.
[17 Mar 2010 3:47] Roel Van de Paar
Another idea: if fixing the CLI part is too difficult, this should still be fixable in mysqlbinlog (based on the fact that a slave replay works fine). I.e. instead of doing: mysql < mysqlbinlog, mysqlbinlog could be improved to connect to the server directly to replay (thereby avoiding the escaping issue). Another idea may be to have hex columns go to a file (similarly to how LOAD DATA files are handled today).
[4 Jun 2010 9:51] Ingvar Hagelund
See also bug #13265. Adding --hex-blob to mysqlbinlog would fix the problem for binary raw data (though not for special non-binary data like the newline problem).
[31 Aug 2010 23:24] Roel Van de Paar
Any updates?
[4 Sep 2010 17:33] MySQL Verification Team
http://www.xaprb.com/blog/2010/09/04/why-mysql-replication-is-better-than-mysqlbinlog-for-...
[27 Dec 2010 8:23] Libing Song
'\r\n' in a name or identifier is also converted to '\n' by mysql.
------------------------------------------------------------------
source include/master-slave.inc;
source include/have_binlog_format_statement.inc;
source include/have_innodb.inc;

FLUSH LOGS;
let $table_name= `SELECT 'A\r\nB'`;
eval CREATE TABLE `$table_name`(c1 INT);
eval INSERT INTO `$table_name` SET c1 = 2;
eval SELECT c1 FROM `$table_name`;
show binlog events IN 'master-bin.000002';

FLUSH LOGS;
eval DROP TABLE `$table_name`;
let $datadir= `SELECT @@datadir`;
--echo $datadir
--exec $MYSQL_BINLOG $datadir/master-bin.000002> "$MYSQLTEST_VARDIR/tmp/slb.sql"
#--error 1064
--exec $MYSQL test < $MYSQLTEST_VARDIR/tmp/slb.sql
--error 1146
eval SELECT c1 FROM `$table_name`;
let $table_name= `SELECT 'A\nB'`;
eval SELECT c1 FROM `$table_name`;
[26 Jan 2011 7:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129609

3515 Libing Song	2011-01-26
      BUG#33048 Not able to recover binary/blob data correctly using mysqlbinlog
      
      The queries generated by mysqlbinlog could not be executed by 'mysql' client
      if there was any 0x00 in it. Or the queries recovered diverged data if there
      was any 0x0D0A sequence in it. The reason is that 'mysql' client always
      translated '0x0D0A'('\r\n') to '0x0A'('\n') and treated '0x00' as the end
      of a query if it is in non-interactive mode(receiving queries from a file
      or a pipe).
      
      This patch added the option 'binary-mode' in 'mysql' program.  'mysql' will
      not translated '0x0D0A' to '0x0A' and not treat '0x00' as the end of a query
      if binary-mode is set 1 and it is in non-interactive mode.
[26 Jan 2011 7:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129610

3515 Libing Song	2011-01-26
      BUG#33048 Not able to recover binary/blob data correctly using mysqlbinlog
      
      The queries generated by mysqlbinlog could not be executed by 'mysql' client
      if there was any 0x00 in it. Or the queries recovered diverged data if there
      was any 0x0D0A sequence in it. The reason is that 'mysql' client always
      translated '0x0D0A'('\r\n') to '0x0A'('\n') and treated '0x00' as the end
      of a query if it is in non-interactive mode(receiving queries from a file
      or a pipe).
      
      This patch added the option 'binary-mode' in 'mysql' program.  'mysql' will
      not translated '0x0D0A' to '0x0A' and not treat '0x00' as the end of a query
      if binary-mode is set 1 and it is in non-interactive mode.
[28 Jan 2011 3:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/129826

3516 Libing Song	2011-01-28
      BUG#33048 Post patch
      To disable mysql client commands in binary mode, except
      'delimiter' and '\C'.
[24 Sep 2011 15:52] Paul DuBois
Noted in 5.6.3 changelog.

The mysql client program now has a --binary-mode option that helps
when processing mysqlbinlog output that may contain BLOB values. By
default, mysql translates \r\n in statement strings to \n and
interprets \0 as the statement terminator. --binary-mode disables
both features. It also disables all mysql commands except charset and
delimiter in non-interactive mode (for input piped to mysql or loaded
using the source command).
[13 Dec 2011 6:30] Andrew Glenn
I'm running 5.5.x and need a way to find and fix bad data in my db so that a recovery will work.  Is there a way I can query my db tables to find the set of bad records?
[30 Nov 2012 4:56] Roel Van de Paar
For anyone who has 5.5.x (in which the --binary-mode option for the mysql client does not exist yet): you should be able to grab the mysql binary from 5.6 and use this with the --binary-mode option to replay logs that have escaped characters (that may be an issue) in them. So, use mysqlbinlog first to extract the log, then use mysql client from 5.6 with --force and --binary-mode to replay such a log. I also use this when doing QA related sql replays of sql traces which contain odd characters (usually as the result of using blobs or hex etc.). The problem is that the client often interprets them as a instruction (i.e. \<something> - just like \?, \r etc.)